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DESCRIPTION 

5 Protein C Production in Transgenic Animals 

BACKGROUND OF THE INVENTION 

Protein C in its activated form plays an 
important role in regulating blood coagulation. The 

10 activated protein C, a serine protease, inactivates 
coagulation Factors Va and Villa by limited proteolysis. 
The coagulation cascade initiated by tissue injury, for 
example, is prevented from proceeding in an unimpeded 
chain-reaction beyond the area of injury by activated 

15 protein C. 

Protein C is synthesized in the liver as a 
single chain precursor polypeptide which is subsequently 
processed to a light chain of about 155 amino acids (M r = 
21,000) and a heavy chain of 262 amino acids (M r =40,000). 

20 The heavy and light chains circulate in the blood as a 
two-chain inactive protein, or zymogen, held together by a 
disulfide bond. When a 12 amino acid residue peptide is 
cleaved from the amino terminus of the heavy chain portion 
of the zymogen in a reaction mediated by thrombin, the 

25 protein becomes activated. The N- terminal portion of the 
light chain contains nine y-carboxyglutamic acid (Gla) 
residues that are required for the calcium-dependent 
membrane binding and activation of the molecule. Another 
blood protein, referred to as "protein S", is believed to 

30 accelerate the protein C-catalyzed proteolysis of Factor 
Va. 

Protein C has also been implicated in the action 
of tissue-type plasminogen activator (Kisiel et al . , 
Behring Inst. Mitt, 22:29-42, 1983) . Infusion of bovine 
35 activated protein C (APC) into dogs results in increased 
plasminogen activator activity (Comp et al . , J. Clin. 
Invest , ££:1221-1228, 1981). Other studies (Sakata et 
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al-i Prog. Natl . Acad, ^i. USA £2:1121-1125, 1985) have 

shown that addition of APC to cultured endothelial cells 
leads to a rapid, dose -dependent increase in fibrinolytic 
activity in the conditioned media, reflecting increases in 

5 the activity of both urokinase-related and tissue-type 
plasminogen activators. APC treatment also results in a 
dose -dependent decrease in anti-activator activity. In 
addition, studies with monoclonal antibodies against 
endogenous APC (Snow et al . , FASEB Abstracts, 1988) 

0 implicate APC in maintaining patency of arteries during 
fibrinolysis and limiting the extent of tissue infarct. 

Experimental evidence indicates that: protein C 
may be clinically useful in the treatment of thrombosis. 
Several studies with baboon models of thrombosis have 

5 indicated that activated protein C in low doses will be 
effective in prevention of fibrin deposition, platelet 
deposition and loss of circulation (Gruber et al . , 

Hemostasia and Thrombosis 374a : abstract 1.512, 1988; 

Widrow et al . , Fibrinolysis 2 suppl. 1: abstract 7, 1988; 

0 Griffin et al . , Thromh . Haemostasis 62 : abstract 1512, 
1989) . 

In addition, exogenous activated protein C has 
been shown to prevent the coagulopathic and lethal effects 
of gram negative septicemia {Taylor et al . , J. Clin. 

5 Invest ■ 12:918-925, 1987). Data obtained from studies 
with baboons suggest that activated protein C plays a 
natural role in protecting against septicemia. 

Until recently, protein C was pu:rified from 
clotting factor concentrates (Marlar et al., Blood 

0 52:1067-1072,^ 1982) or from plasma (Kisiel , J. Clin. 
Invest . £A:761-769, 1979) and activated in vitro. 
However, the possibility that the resulting product could 
be contaminated with such infectious agents as hepatitis 
virus, cytomegalovirus, or human immunodeficiency virus 

5 (HIV) make the process unfavorable. 

While expression of protein C through 
recombinant means has been theoretically possible as the 
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genes for both human and bovine protein C are known 
(Foster et al . , Proc , Natl . Acad , Sci , USA Q2 : 4 673 -4677 , 
1985; Foster et al . , Proc. N atl. Acad .qci. USA £JL:4766- 
4770, 1984 and U.S. Patent 4,775,624), it has been met 
5 with limited success. Expression of some vitamin In- 
dependent proteins, such as protein C in cultured cells, 
has not produced protein C that has been at both 
commercially valuable levels and biologically functional 
when activated (i.e. had anticoagulant activity (Grinnell 

10 et al., in Bruley and Drohn, eds . , Protein C frnd Related 
Anticoagulants : 29-63 . Gulf Publishing, Houston, TX and 
Grinnell et al . , Bio/Technol . £:1189-1192, 1987)). 
Transgenic expression of protein C has yielded somewhat 
higher levels of expression, but the recombinant protein 1 s 

15 anticoagulant activity has still remained low, with less 
than 50% of the material having biological activity 
(Velander et al . , Proc . Natl , Aca d . Sci . USA £2 : 12003- 
12007, 1992) . Therefore, there remains a need for 
producing protein C that is both expressed at high levels 

20 and has therapeutic value. 

SUMMARY OF THE INVENTION 

It is an object of the present invention to 
provide methods for producing protein C in transgenic 

25 animals. It is a further object to provide transgenic 
animals that express human protein C in a mammary gland. 

Within one aspect, the present invention 
provides methods for producing protein C in a transgenic 
animal comprising (a) providing a DNA construct comprising 

30 a first DNA segment encoding a secretion signal and a 
protein C propeptide operably linked to a second DNA 
segment encoding protein C, wherein the encoded protein C 
comprises a two-chain cleavage site modified from Lys-Arg 
to R!-R2-R3-R4, and wherein each of R1-R4 is individually 

3 5 Lys or Arg, and wherein said first and second segments are 
operably linked to additional DNA segments required for 
expression of the protein C DNA in a lactating mammary 
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gland of a host female animal; (b) introducing said DNA 
construct into a fertilized egg of a non-human mammalian 
species; (c) inserting said egg into an oviduct or uterus 
of a female of said species to obtain offspring carrying 
5 said DNA construct; (d) breeding said offspring to produce 
female progeny that express said first and second DNA 
segments and produce milk containing protein C encoded by 
said second segment, wherein said protein has 
anticoagulant activity upon activation; {e) collecting 

10 milk from said female progeny; and (f) recovering the 
protein C from the milk. In one embodiment, R1-R2-R3-R4 
is Arg-Arg-Lys-Arg (SEQ ID NO: 20) . m another 

embodiment, the method further comprises the step of 
activating the protein C. In another embodimert, the non- 

15 human mammalian species is selected from sheep, rabbits, 
cattle and goats. in another embodiment each cf the first 
and second DNA segments comprises an intron. In another 
embodiment, the second DNA segment comprises a DNA 
sequence of nucleotides as shown in SEQ ID NO: 1 or SEQ ID 

20 NO:3. In another embodiment, the additional ENA segments 
comprise a transcriptional promoter selected from the 
group consisting of casein, 0-lactoglooulin, a- 
lactoglobulin, a- lactalbumin and whey acidic protein gene 
promoters . 

25 In another aspect, the present invention 

provides a transgenic non-human female mammal that 
produces recoverable amounts of human protein C in its 
milk, wherein at least 90% of the human proter.n C in the 
milk is two-chain protein C. 

30 In another aspect, the present invention 

provides a process for producing a transgenic cffspring of 
a mammal comprising the steps of (a) providing a DNA 
construct comprising a first DNA segment encoding a 
secretion signal and a protein C propeptide operably 

35 linked to a second DNA segment encoding protein C, wherein 
the encoded protein C comprises a two-chain cleavage site 
modified from Lys-Arg to R3.-R2-R3-R4, and wherein each of 
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R 1" R 4 i s individually Lys or Arg, and wherein said first 
and second segments are operably linked to additional DNA 
segments required for expression of the protein C DNA in a 
lactating mammary gland of a host female animal; (b) 
5 introducing said DNA construct into a fertilized egg of a 
non- human mammalian species; and (c) inserting said egg 
into an oviduct or uterus of a female of said species to 
obtain offspring carrying said DNA construct. 

Within another aspect, the present invention 

10 provides non- human mammals produced according to the 
process for producing a transgenic offspring of a mammal 
comprising the steps of (a) providing a DNA construct 
comprising a first DNA segment encoding a secretion signal 
and a protein C propeptide operably linked to a second DNA 

15 segment encoding protein C, wherein the encoded protein C 
comprises a two- chain cleavage site modified from Lys -Arg 
to R 1 -R2"R3~R4/ and wherein each of R1-R4 is individually 
Lys or Arg, and wherein said first and second segments are 
operably linked to additional DNA segments required for 

20 expression of the protein C DNA in a lactating mammary 
gland of a host female animal; (b) introducing said DNA 
construct into a fertilized egg of a non-human mammalian 
species; and (c) inserting said egg into an oviduct or 
uterus of a female of said species to obtain offspring 

25 carrying said DNA construct- 

In another aspect, the present invention 
provides a non-human mammalian embryo containing in its 
nucleus a heterologous DNA segment encoding protein C, 
wherein the encoded protein C comprises a two-chain 

30 cleavage site modified from Lys-Arg to R1-R2-R3-R4, and 
wherein each of R1-R4 is individually Lys or Arg. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 illustrates analysis of plasma-derived 
3 5 and transgenic protein C run under non- reducing and 
reducing conditions. Lane 1 is plasma-derived protein C 
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and lane 2 is transgenic protein C from the milk of sheep 
30851 . 

Figure 2 illustrates sequencing of protein C 
from sheep line 30851. The initial yields were 

5 prosequence=9 pmol, light chain=563 pmol and heavy 
chain=565 pmol. 

Figure 3 illustrates clotting activity of 
transgenic protein C compared to plasma-derived protein C. 

10 DETAILED DESCRIPTION OF THE INVENTION 

Prior to setting forth the invention in detail, 
it will be helpful to define certain terms used herein: 

As used herein, the term "biologically active" 
is used to denote protein C that is characterized by its 

15 anticoagulant and fibrinolytic properties. Protein C, 
when activated, inactivates factor Va and factor Villa in 
the presence of phospholipid and calcium. Activated 
protein C also enhances fibrinolysis, an effect believed 
to be mediated by the lowering of the levels of 

20 plasminogen activator inhibitors. As stated previously, 
two-chain protein C is activated upon cleavage of a 12 
amino acid peptide from the amino terminus of the heavy 
chain portion of the zymogen. 

The term "egg" is used to denote an unfertilized 

25 ovum, a fertilized ovum prior to fusion of the pronuclei 
or an early stage embryo (fertilized ovum with fused 
pronuclei) . 

A "female mammal that produces milk containing 
biologically active protein C" is one that, following 
3 0 pregnancy and delivery, produces, during the lactation 
period, milk containing recoverable amounts o:: protein C 
that can be activated to be biologically act:.ve. Those 
skilled in the art will recognized that such animals will 
naturally produce milk, and therefore the protein C, 
3 5 discontinuous ly . 

The term "progeny" is used in its usual sense to 
include offspring and descendants. 
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The term "heterologous" is used to denote 
genetic material originating from a different species than 
that into which it has been introduced, or a protein 
produced from such genetic material . 
5 Within the present invention, transgenic animal 

technology is employed to produce protein C within a 
mammary gland of a host female mammal. Expression in the 
mammary gland and subsequent secretion of the protein of 
interest into the milk overcomes many difficulties 

10 encountered in isolating proteins from other sources. 
Milk is readily collected, available in large quantities, 
and well characterized biochemically. Furthermore, the 
major milk proteins are present in milk at high 
concentrations (from about 1 to 16 g/1). 

15 From a commercial point of view, it is clearly 

preferable to use as the host a species that has a large 
milk yield. While smaller animals such as mice and rats 
can be used (and are preferred at the proof -of -concept 
stage) , within the present invention it is preferred to 

20 use livestock mammals including sheep and cattle. Sheep 
are particularly preferred due to such factors as the 
previous history of transgenesis in this species, milk 
yield, generation time, cost and the ready availability of 
equipment for collecting sheep milk. It is generally 

25 desirable to select a breed of host animal that has been 
bred for dairy use, such as East Friesland sheep, or to 
introduce dairy stock by breeding of the transgenic line 
at a later date. In any event, animals of known, good 
health status should be used. 

30 Cloned DNA sequences encoding human protein C 

have been described (Foster and Davie, Proc. Natl, Acad. 
Sci , USA £1:4766-4770, 1984; Foster et al . , Proc. Natl. 
Acad. USA £2:4673-4677, 1985; and Bang et al., U.S. Patent 
4,755,624, each incorporated herein by reference). 

3 5 Complementary cDNAs encoding protein C can be obtained 
from libraries prepared from liver cells of various 
mammalian species according to standard laboratory 
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procedures. DNAs from other species, such as the protein 
C encoded by rats, pigs, sheep, cows and primates can be 
used and can be identified using probes from human cDNA. 

In a preferred embodiment, human genomic DNAs 
5 encoding protein C are used. The human protein C gene is 
composed of nine exons ranging in size from 25 to 885 
nucleotides, and seven introns ranging in size from 92 to 
2668 nucleotides (U.S. Patent 4,959,318, incorporated 
herein by reference) . The first exon is non-coding and 
10 referred to as exon O. Exon I and a portion of exon II 
code for the 42 amino acid signal sequence and propeptide 
(i.e., pre -propeptide) . The remaining portion of exon II, 
exon III, exon IV, exon V and a portion of exon VI code 
for the light chain of protein C. The remaining portion 
15 of exon VI, exon VII and exon VIII code for the heavy 
chain of protein C. A representative human genomic DNA 
sequence and corresponding amino acid sequence are shown 
in SEQ ID NOS: 1 and 2, respectively. A representative 
human protein C cDNA sequence and corresponding amino acid 
20 sequences are shown in SEQ ID NO: 3 and 4, respectively. 

Those skilled in the art will recognize that 
naturally occurring allelic variants of these sequences 
will exist; that additional variants can be generated by 
amino acid substitution, deletion, or insertion; and that 
25 such variants are useful within the present invention. In 
general, it is preferred that any engineered variants 
comprise only a limited number of amino acid 
substitutions, deletions, or insertions, and that any 
substitutions are conservative. Thus, it is preferred to 
30 produce protein C polypeptides that are at least 90%, and 
more preferably at least 95% or more identical in sequence 
to the corresponding native protein. 

Within the present invention, the proteolytic 
processing involved in the maturation of recombinant 
35 protein C from single chain form to the two-rhain form 
(i.e., cleaved between the light chain and the heavy 
chain) has been enhanced by modifying the amino acid 
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sequence around the two-chain cleavage site. In the 
normal situation, endoproteolyt ic cleavage of the 
precursor molecule at the Arg 157 -Asp 158 bond and the 
removal of the dipeptide Lys 156 - Arg 157 by a 
5 carboxypeptidase activity generate the light and heavy 
chains of protein C prior to secretion. Expression of 
protein C with the native (Lys-Arg) two-chain cleavage 
site produces protein C that may contain up to 40% or more 
uncleaved, single-chain protein C (Grinnel et al . , in 

10 Protein C and Related Anticoagulants, eds . , Bruley and 
Drohan, Gulf, Houston, pp. 29-63, 1990; Suttie, Thromb . 
Res. £1:129-134, 1986 and Yan et al . , Trends B iochem. Sci. 
14:264-268, 1989) . The single-chain form of protein C may 
not be able to be activated. The cleavage site may be in 

15 the form of the amino acid sequence R1-R2-R3-R4/ wherein 
each of Rl through R4 is individually lysine (Lys) or 
arginine (Arg) . Particularly preferred sequences include 
Arg-Arg- Lys-Arg (SEQ ID NO: 20) and Lys-Arg -Lys-Arg (SEQ 
ID NO: 21) . 

20 In a preferred embodiment, the present invention 

provides for recoverable amounts of human protein C in the 
milk of a non-human mammal, where at least 90%, preferably 
at least 95%, of the human protein C is two-chain protein 
C. 

2 5 To obtain expression in the mammary gland, a 

transcription promoter from a milk protein gene is used. 
Milk protein genes include those genes encoding caseins, 
beta-lactoglobulin (BLG) , ot-lactalbumin, and whey acidic 
protein. The beta-lactoglobulin promoter is preferred. 

3 0 In the case of the ovine beta-lactoglobulin gene, a region 

of at least the proximal 406 bp of 5' flanking sequence of 
the ovine BLG gene (contained within nucleotides 3844 to 
4257 of SEQ ID NO: 5) will generally be used. Larger 
portions of the 5' flanking sequence, up to about 5 kb, 
35 are preferred. A larger DNA segment encompassing the 5' 
flanking promoter region and the region encoding the 5' 
non-coding portion of the beta-lactoglobulin gene 
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(contained within nucleotides 1 to 4257 of SEQ ID NO: 5) 
is particularly preferred. See Whitelaw et al . , Biochem 
J , 2i£: 31-39, 1992. Similar fragments of promoter DNA 
from other species are also suitable. 
5 Other regions of the beta-lactoglobulin gene may 

also be incorporated in constructs, as may genomic regions 
of the gene to be expressed. It is generally accepted in 
the art that constructs lacking introns, for example, 
express poorly in the transgenic lactating mammary gland 

10 in comparison with those constructs that contain introns 
(see Brinster et al . , Proc . Natl, Acad. Sci . USA 836- 
840, 1988; Palmiter et al . , Prpc . Natl. Acad ■ Sci. USA 
478-482, 1991; Whitelaw et al . , Transgenic Res . 1: 3-13, 
1991; WO 89/01343; WO 91/02318). In this regard, it is 

15 generally preferred, where possible, to use genomic 
sequences containing all or some of the native introns of 
a gene encoding protein C. Within certain embodiments of 
the invention, the further inclusion of at least some 
introns from the beta-lactoglobulin gene is preferred. 

20 One such region is a DNA segment which provides for intron 
splicing and RNA polyadenylation from the 3* non-coding 
region of the ovine beta-lactoglobulin gene . When 
substituted for the natural 3 1 non-coding sequences of a 
gene, this ovine beta-lactoglobulin segment can both 

25 enhance and stabilize expression levels of the protein C. 

For expression of protein C, DNA segments 
encoding protein C are operably linked to additional DNA 
segments required for their expression t:o produce 
expression units. One such additional segment is the 

30 above-mentioned milk protein gene promoter. Sequences 
allowing for termination of transcription and 
polyadenylation of mRNA may also be incorporated. Such 
sequences are well known in the art, for example, one such 
termination sequence is the "upstream mouse sequence" 

35 (McGeady et al . , DNA 289-298 , 1986 ) . The expression 
units will further include a DNA segment encoding a 
secretion signal operably linked to the segment encoding 
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the protein C polypeptide chain. The secretion signal may 
be a native protein C secretion signal or may be that of 
another protein, such as a milk protein. The term 
"secretion signal" is used herein to denote that portion 
5 of a protein that directs it through the secretory pathway 
of a cell to the outside. Secretion signals are most 
commonly found at the amino termini of proteins. See, for 
example, von Heinje, Nuc . Acids Rfis. 1±: 4683-4690, 1986; 
and Meade et al . , U.S. Patent No. 4,8 73,316, which are 

10 incorporated herein by reference. 

Construction of expression units is conveniently 
carried out by inserting a protein C sequence into a 
plasmid or phage vector containing the additional DNA 
segments, although the expression unit may be constructed 

15 by* essentially any sequence of ligations. It is 

particularly convenient to provide a vector containing a 
DNA segment encoding a milk protein and to replace the 
coding sequence for the milk protein with that of a 
protein C (including a secretion signal), thereby creating 

20 a gene fusion that includes the expression control 
sequences of the milk protein gene. In any event, cloning 
of the expression units in plasmids or other vectors 
facilitates the amplification of the protein C sequences. 
Amplification is conveniently carried out in bacterial 

25 (e.g. E. coli) host cells, thus the vectors will typically 
include an origin of replication and a selectable marker 
functional in bacterial host cells. 

The expression unit is then introduced into 
fertilized eggs (including early-stage embryos) of the 

30 chosen host species. Introduction of heterologous DNA can 
be accomplished by one of several routes, including 
pronuclear microinjection (e.g. U.S. Patent No. 
4,873,191), retroviral infection (Jaenisch, Science 240 : 
1468-1474, 1988) or site-directed integration using 

35 embryonic stem (ES) cells (reviewed by Bradley et al . , 
BiQ/Te^hnology JL£: 534-539, 1992) . The eggs are then 
implanted into the oviducts or uteri of pseudopregnant 
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females and allowed to develop to term. Offspring 
carrying the introduced DNA in their germ lirie can pass 
the DNA on to their progeny in the normal, Mendelian 
fashion, allowing the development of transgenic herds. 
5 General procedures for producing transgenic animals are 
known in the art. See, for example, Hogan et al . , 
Manipulating the Mouse Embr yo: A Laboratory Manual , Cold 
Spring Harbor Laboratory, 1986; Simons et al . , 
Bio/Technology £: 179-183, 1988; Wall et al . , Biol . 

10 Reprod , i2: 645-651, 1985; Buhler et al . , Bio/Technology 
£: 140-143, 1990; Ebert et al . , Bio/Technology 9: 835-838, 
1991; Krimpenfort et al . , Bio/Technology 9: 844-847, 1991; 
Wall et al., J. Cell. Bioch^m. 113-120, 1992; and WIPO 

publications WO 88/00239, WO 90/05188, WO 92/11757; and GB 

15 87/00458, which are incorporated herein by reference. 
Techniques for introducing foreign DNA sequences into 
mammals and their germ cells were originally developed in 
the mouse. See, e.g., Gordon et al . , Proc . Hatl . Acad. 
Sci . USA .77: 73 80-73 84, 1980; Gordon and Ruddle, Science 

20 214 : 1244-1246, 1981; Palmiter and Brinster, Cell 4JL: 343- 
345, 1985; Brinster et al . , Proc. Natl. Acad. Sci. USA 
4438-4442, 1985; and Hogan et al . (ibid.). These 
techniques were subsequently adapted for use with larger 
animals, including livestock species (see e.g., WIPO 

25 publications WO 88/00239, WO 90/05188, and WO 92/11757; 
and Simons et al . , Bio/Technology £ : 179-183, 1988). To 
summarize, in the most efficient route used to date in the 
generation of transgenic mice or livestock, several 
hundred linear molecules of the DNA of interest are 

30 injected into one of the pro-nuclei of a fertilized egg. 
Injection of DNA into the cytoplasm of a zygote can also 
be employed. 

In general, female animals are superovulated by 
treatment with follicle stimulating hormone, then mated. 
35 Fertilized eggs are collected, and the heterologous DNA is 
injected into the eggs using known methods. See, for 
example, U.S. Patent No. 4,873,191; Gordon et al . , Proc . 
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Natl. Acad. Sci , USA 21: 7380-7384, 1980; Gordon and 
Ruddle, Science 214 : 1244-1246, 1981; Palraiter and 
Brinster, Cell 4-1: 343-345, 1985; Brinster et al., Proc , 
Natl , Acad , Sci , USA £2: 4438-4442, 1985; Hogan et al . , 
5 Manipulating the Mouse Embryo: A Laboratory Manual . Cold 
Spring Harbor Laboratory, 1986; Simons et al . 
Bio/Tec hnoloay £: 179-183, 1988; Wall et al . , Biol , 
Reprod , 22: 645-651, 1985; Buhler et al . , Bio/Technology 
&: 140-143, 1990; Ebert et al . , Bio/Technology 9: 835-838, 
10 1991; Krimpenfort et al. , Bio/Technology 2_ : 844-847, 1991; 
Wall et al., J. Cell. Bioch^m. 4^ : 113-120, 1992; WIPO 
publications WO 88/00239, WO 90/05118, and WO 92/11757; 
and GB 87/00458, which are incorporated herein by 
reference . 

15 For injection into fertilized eggs, the 

expression units are removed from their respective vectors 
by digestion with appropriate restriction enzymes. For 
convenience, it is preferred to design the vectors so that 
the .expression units are removed by cleavage with enzymes 

20 that do not cut either within the expression units or 
elsewhere in the vectors. The expression units are 
recovered by conventional methods, such as electro-elut ion 
followed by phenol extraction and ethanol precipitation, 
sucrose density gradient centrif ugation, or combinations 

2 5 of these approaches. 

DNA is injected into eggs essentially as 
described in Hogan et al . , ibid. In a typical injection, 
eggs in a dish of an embryo culture medium are located 
using a stereo zoom microscope (x50 or x63 magnification 
30 preferred) . Suitable media include Hepes (N-2- 

hydroxyethylpiperazine-N' -2 -ethanesulphonic acid) or 
bicarbonate buffered media such as M2 or M16 (available 
from Sigma Chemical Co., St. Louis, USA) or synthetic 
oviduct medium (disclosed below) . The eggs are secured 

3 5 and transferred to the center of a glass slide on an 

injection rig using, for example, a drurnmond pipette 
complete with capillary tube. Viewing at lower (e.g. x4 ) 



WO 97/20043 



14 



PCT/US96/1 8866 
« 



magnification is used at this stage. Using the holding 
pipette of the injection rig, the eggs an; positioned 
centrally on the slide. Individual eggs are sequentially 
secured to the holding pipette for injection. For each 
5 injection process, the holding pipette/egg is positioned 
in the center of the viewing field. The injection needle 
is then positioned directly below the egg. Preferably 
using x4 0 Nomarski objectives, both manipulator heights 
are adjusted to focus both the egg and the needle. The 

10 pronuclei are located by rotating the egg and adjusting 
the holding pipette assembly as necessary. Once the 
pronucleus has been located, the height of the manipulator 
is altered to focus the pronuclear membrane. The 
injection needle is positioned below the egg such that the 

15 needle tip is in a position below the center of the 
pronucleus. The position of the needle is then altered 
using the injection manipulator assembly to bring the 
needle and the pronucleus into the same focal plane. The 
needle is moved, via the joy stick on the injection 

20 manipulator assembly, to a position to the right of the 
e 99- With a short, continuous jabbing movement, the 
pronuclear membrane is pierced to leave the needle tip 
inside the pronucleus. Pressure is applied to the 

injection needle via, for example, a glass syringe until 

25 the pronucleus swells to approximately twice its volume. 
At this point, the needle is slowly removed. Reverting to 
lower (e.g. x4) magnification, the injected egg is moved 
to a different area of the slide, and the process is 
repeated with another egg . 

3 0 After the DNA is injected, the eggs may be 

cultured to allow the pronuclei to fuse, producing one- 
cell or later stage embryos. In general, the eggs are 
cultured at approximately the body temperature of the 
species used in a buffered medium containing balanced 

3 5 salts and serum. Surviving embryos are then transferred 
to pseudopregnant recipient females, typically by 
inserting them into the oviduct or uterus, and allowed to 
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develop to term. During embryogenesis, some of the 
injected DNA integrates in a random fashion in the genomes 
of a small number of the developing embryos. 

Potential transgenic offspring are screened via 
5 blood samples and/or tissue biopsies. DNA is prepared 
from these samples and examined for the presence of the 
injected construct by techniques such as polymerase chain 
reaction (PCR; see Mullis, U.S. Patent No. 4,683,202) and 
Southern blotting {Southern, J. Mol. Biol. 2£:503, 1975; 

10 Maniatis et al . , Molecular ClQH 3i ng ; A Laboratory Manual . 

Cold Spring Harbor Laboratory, 1982) . Founder transgenic 
animals, or GOs, may be wholly transgenic, having 
transgenes in all of their cells, or mosaic, having 
transgenes in only a subset of cells (see, for example, 

15 Wilkie et al . , Develop . Biol , H£: 9-18, 1986). In the 
latter case, groups of germ cells may be wholly or 
partially transgenic. In the latter case, the number of 
transgenic progeny from a founder animal will be less than 
the expected 50% predicted from Mendelian principles. 

20 Founder GO animals are grown to sexual maturity and mated 
to obtain offspring, or Gls . The Gls are also examined 
for the presence of the transgene to demonstrate 
transmission from founder GO animals. In the case of male 
GOs, these may be mated with several non- transgenic 

25 females to generate many offspring. This increases the 
chances of observing transgene transmission. Female GO 
founders may be mated naturally, artificially inseminated 
or superovulated to obtain many eggs which are transferred 
to surrogate mothers. The latter course gives the best 

3 0 chance of observing transmission in animals having a 
limited number of young. The above -described breeding 
procedures are used to obtain animals that can pass the 
DNA on to subsequent generations of offspring in the 
normal, Mendelian fashion, allowing the development of, 

35 for example, colonies (mice) , flocks (sheep) , or herds 
(pigs, goats and cattle) of transgenic animals. 
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The milk from lactating GO and Gl females is 
examined for the expression of the heterologous protein 
using immunological techniques such as ELISA (see Harlow 

and Lane, Antibodies , & Laboratory Manual , Cold Spring 

5 Harbor Laboratory, 1988) and Western blottinc (Towbin et 
al., Proc. Natl. Ara d. Sci. USA 76: 4350-4354, 1979). For 
a variety of reasons known in the art, expression levels 
of the heterologous protein will be expected to differ 
between individuals . 

10 A satisfactory family of animals should satisfy 

three criteria: they should be derived from the same 
founder GO animal; they should exhibit stable transmission 
of the transgene; and they should exhibit acceptably 
stable expression levels from generation to generation and 

15 from lactation to lactation of individual aninals. These 
principles have been demonstrated and discussed (Carver et 
al., Bio/Technology XL: 1263-1270, 1993). 7oiimals from 
such a suitable family are referred to as a "line." 
Initially, male animals, GO or Gl, are used to derive a 

20 flock or herd of producer animals by natural or artificial 
insemination. In this way, many female animals containing 
the same transgene integration event can be quickly 
generated from which a supply of milk can be obtained. 

The protein C is recovered from milk using 

25 standard practices such as skimming, precipitation, 
filtration and protein chromatography techniques. 

Protein C produced according to the present 
invention can be activated by removal of the activation 
peptide from the amino terminus of the heavy chain. 

30 Activation can be achieved using methods that are well 
known in the art, for example, using a- thrombi:.! (Marlar et 
al., aiQOd £2:1067-1072, 1982), trypsin (Marlar et al . , 
1982, ibid.), Russel 1 s viper venom factor X activator 
(Kisiel, sL Clin. Invest . ££:761-769, 1979) or 

35 commercially available Protac C (American Diagnost ica , NY, 
NY) . 
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The protein C molecules provided by the present 
invention and pharmaceutical compositions thereof are 
particularly useful for administration to humans to treat 
a variety of conditions involving intravascular 
5 coagulation. For instance, although deep vein thrombosis 
and pulmonary embolism can be treated with conventional 
anticoagulants, the activated protein C described herein 
may be used to prevent the occurrence of thromboembolic 
complications in identified high risk patients, such as 

10 those undergoing surgery or those with congestive heart 
failure. Since activated protein C is more selective than 
heparin, being active in the body generally when and where 
thrombin is generated and fibrin thrombi are formed, 
activated protein C will be more effective and less likely 

15 to cause bleeding complications than heparin when used 
prophylactically for the prevention of deep vein 
thrombosis. The dose of activated protein C for 

prevention of deep vein thrombosis is in the range of 
about 100 jig to 100 mg/day, and administration should 

20 begin at least about 6 hours prior to surgery and continue 
at least until the patient becomes ambulatory. In 
established deep vein thrombosis and/or pulmonary 
embolism, the dose of activated protein C ranges from 
about 100 pag to 100 mg as a loading dose followed by 

25 maintenance doses ranging from 3 to 300 mg/day. Because 
of the lower likelihood of bleeding complications from 
activated protein C infusions, activated protein C can 
replace or lower the dose of heparin during or after 
surgery in conjunction with thrombectomies or 

3 0 embolectomies . 

The activated protein C compositions of the 
present invention will also have substantial utility in 
the prevention of cardiogenic emboli and in the treatment 
of thrombotic strokes. Because of its low potential for 

35 causing bleeding complications and its selectivity, 
activated protein C can be given to stroke victims and may 
prevent the extension of the occluding arterial thrombus. 
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The amount of activated protein C administered will vary 
with each patient depending on the nature and severity of 
the stroke, but doses will generally be in the range of 
those suggested below. 
5 Pharmaceutical compositions of activated protein 

C provided herein will be a useful treatment in acute 
myocardial infarction because of the ability of activated 
protein C to enhance in vitro fibrinolysis. Activated 
protein C can be given with tissue plasminogen activator 

10 or streptokinase during the acute phases of the myocardial 
infarction. After the occluding coronary thrombus is 
dissolved, activated protein C can be given for subsequent 
days or weeks to prevent coronary reocculsion . In acute 
myocardial infarction, the patient is given a loading dose 

15 of at least about 1-500 mg of activated protein C, 
followed by maintenance doses of 1-100 mg/day. 

Activated protein C is useful in the treatment 
of disseminated intravascular coagulation (DIC> . Patients 
with DIC characteristically have widespread 

20 microcirculatory thrombi and often severe bleeding 
problems which result from consumption o:: essential 
clotting factors. Because of its selectivity, activated 
protein C will not aggravate the bleeding problems 
associated with DIC, as do conventional anticoagulants, 

25 but will retard or inhibit the formation of additional 
microvascular fibrin deposits. 

The invention is further illustrated by the 
following non- limiting examples. 

3 0 EXAMPLES 
Example T 

A. Vector PMAD6 Construction 

The multiple cloning site of the vector pUC18 
35 (Yanisch-Perron et al . , Q^ns, 103-119, 1985) was removed 
and replaced with a synthetic double stranded 
oligonucleotide (the strands of which are shown in SEQ ID 
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NO: 6 and SEQ ID NO: 7) containing the restriction sites 
Pvu I/Mlu I/Eco RV/Xba I/Pvu I/Mlu I, and flanked by 5' 
overhangs compatible with the restriction sites Eco RI and 
Hind III. pUC18 was cleaved with both Eco RI and Hind 
5 III, the 5' terminal phosphate groups were removed with 
calf intestinal phosphatase, and the oligonucleotide was 
ligated into the vector backbone. The DNA sequence across 
the junction was confirmed by sequencing, and the new 
plasmid was called pUCPM. 

10 The b-lactoglobulin (BLG) gene sequences from 

pSSltgXS (disclosed in WIPO publication WO 88/00239) were 
excised as a Sal I-Xba I fragment and recloned into the 
vector pUCPM that had been cut with Sal I and Xba I to 
construct vector pUCXS . pUCXS is thus a pUC18 derivative 

15 containing the entire BLG gene from the Sal I site to the 
Xba I site of phage SSI (Ali and Clark, rr. Mol . Biol. 199 : 
415-426, 1988) . 

The plasmid pSSltgSE (disclosed in WIPO 
publication WO 88/0023 9) contains a 1290 bp BLG fragment 

20 flanked by Sph I and EcoR I restriction sites, a region 
spanning a unique Not I site and a single Pvu II site 
which lies in the 5' untranslated leader of the BLG mRNA. 
Into this Pvu II site was ligated a double stranded, 8 bp 
DNA linker (5 1 -GGATATCC-3 1 ) encoding the recognition site 

25 for the enzyme Eco RV. This plasmid was called 

pSSltgSE/RV. DNA sequences bounded by Sph I and Not I 
restriction sites in pSSltgSE/RV were excised by enzymatic 
digestion and used to replace the equivalent fragment in 
pUCXS. The resulting plasmid was called pUCXSRV. The 

3 0 sequence of the BLG insert in pUCXSRV is shown in SEQ ID 
NO: 5, with the unique Eco RV site at nucleotide 424 5 in 
the 5* untranslated leader region of the BLG gene. This 
site allows insertion of any additional DNA sequences 
under the control of the BLG promoter 3 1 to the 

35 transcription initiation site. 

Using the primers BLGAMP3 ( 5 1 -TGG ATC CCC TGC 
CGG TGC CTC TGG - 3 1 ; SEQ ID NO: 8) and BLGAMP4 (5 f -AAC GCG 
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TCA TCC TCT GTG AGC CAG-3 ' ; SEQ ID NO: 9) a PCR fragment 
of approximately 650 bp was produced from sequences 
immediately 3 • to the stop codon of the ELG gene in 
pUCXSRV. The PCR fragment was engineered to have a BamH I 
5 site at its 5' end and an Mlu I site at its 3' end and was 
cloned as such into BamH I and Mlu I cut pGEM7zf( + ) 
(Promega) to give pDAM200(+). 

pUCXSRV was digested with Kpn I , and the 
largest, vector containing band was gel purified. This 

10 band contained the entire pUC plasmid sequences and some 
3* non-coding sequences from the BLG gene. Into this 
backbone was ligated the small Kpn I fragment from 
pDAM200(+) which, in the correct orientation, effectively 
engineered a Bam HI site at the extreme 5' end of the 2.6 

15 Kbp of the BLG 3' flanking region. This plasmid was 
called pBLAC200. A 2.6 Kbp Cla I-Xba I fragment from 
PBLAC200 was ligated into Cla I-Xba I cut pSP72 vector 
(Promega) , thus placing an Eco RV site immediately 
upstream of the BLG sequences. This plasmid was called 

20 pBLAC210. 

The 2.6 Kbp Eco RV-Xba I fragment from pBLAC210 
was ligated into Eco RV-Xba I cut pUCXSRV to form pMAD6 
(SEQ ID NO: 23). This, in effect, excised all coding and 
intron sequences from pUCXSRV, forming a BLG minigene 

25 consisting of 4. 2 . Kbp of 5' promoter and 2.6 Kbp of 3» 
downstream sequences flanking a unique Eco RV site. An 
oligonucleotide linker (ZC683 9: ACTACGTAGT ; SEC ID NO: 10) 
was inserted into the Eco RV site of pMAD6 (SEQ ID NO: 
23) . This modification destroyed the Eco EV site and 

30 created a Sna BI site to be used for cloning purposes. 
The vector was designated pMAD6 -Sna . Messenger RNA 

initiates upstream of the Sna BI site and terminates 
downstream of the Sna BI site. The precursor transcript 
will encode a single BLG-derived intron, intron 6, which 

35 is entirely within the 3 1 untranslated region of the gene. 
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B. IntronleRs Vfir^ or pMAD 

The beta-lactoglobulin cloning vector pMAD was 
also constructed to allow the insertion of cDNAs under the 
control of the beta-lactoglobulin gene promoter in 
5 constructs containing no introns. To generate pMAD, the 
plasmid pBLAClOO was opened by digestion with both Eco RV 
and Sal I . The vector fragment was gel purified and the 
linearized vector was ligated with the 4.2 kb promoter 
fragment from the plasmid pUCXSRV as a Sal I -Eco RV 
10 fragment. The resulting construct was designated pSTl and 
constitutes a beta-lactoglobulin mini-gene encompassing a 
4.2 kb of promoter region and 2.1 kb of 3' non-coding 
region beginning immediately downstream of the beta- 
lactoglobuling translational termination codon. A unique 
15 Eco RV site allows blunt -end cloning of any additional DNA 
sequences. To generate transgenic animals it is generally 
accepted in the art and preferred to separate bacterial 
plasmid vector sequences from those intended to be used in 
the generation of transgenic animals. In order to allow 
20 the practical excision of novel cDNA based constructs 
using this beta-lactoglobulin mini-gene, the minigene was 
excised from pSTl on a Xho I -Not I fragment, the DNA 
termini made flush with Klenow polymerase and the product 
was ligated into the Eco RV site of pUCPM to yield pMAD. 
25 Digestion with Mlu I liberates beta-lactoglobulin-cDNA 
constructs from the bacterial vector backbone. 

Intronless constructs based on cDNAs and vectors 
such as pMAD benefit from the use of "rescue technology" 
for efficient expression. Rescue technology takes 

30 advantage of the ability of a co-injected and co- 
integrated BLG gene to improve the expression levels 
obtained from intronless, cDNA-based constructs in the 
transgenic system. Rescue technology is disclosed in WIPO 
publication WO 92/11358, and is incorporated herein by 
3 5 reference. 
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Example 2 

A. Isolation of cDNA 

A cDNA sequence coding for human protein C was 
prepared as described in U.S. Patent 4,959,318, which is 
5 incorporated herein by reference. Briefly, a genomic 
fragment containing an exon corresponding to amino acids - 
42 to -19 (SEQ ID NO: 1) of the pre-pro peptide of protein 
C was isolated, nick translated and used as a probe to 
screen a cDNA library constructed by the technique of 

10 Gubler and Hoffman, Gene 2^:263-269, 1983, using mRNA from 
HepG2 cells. This cell line was derived from human 
hepatocytes and was previously shown to synthesize protein 
C (Fair and Bahnak, Blood &£: 194-204, 1984) . Positive 
clones comprising cDNA inserted into the Eco RI site of 

15 phage A,gtll were isolated and screened with an 
oligonucleotide probe corresponding to the 5 1 non-coding 
region of the protein C gene. One clone was al.so positive 
with this probe and its entire nucleotide sequence was 
determined. The cDNA contained 70 bp of 5' untranslated 

2 0 sequence, the entire coding sequence for hunnan prepro- 
protein C, and the entire 3' non-coding region 
corresponding to the second polyadenylat ion site. 

B. Subclonina of Protein c rDMA 

The vector pDX was derived from pD3 , which was 
25 generated from plasmid pDHFRIII (Berkner et al . , Muc_^ 
Acidff Rfiff, 12:841-857, 1985). The Pst I site immediately 
upstream from the DHFR sequence in pDHFRIII was converted 
to a Bel I site by digestion with Pst I. The DNA was 
phenol extracted, ethanol precipitated and resuspended in 
30 buffer B (50 mM Tris pH 8 , 7 mM MgCl 2 , 7 mM p-MSH) . A 
ligation reaction containing the linearized plasmid DNA 
and Bel I linkers was done. The resulting plasmid was 
phenol extracted, ethanol precipitated and digested with 
Bel I and gel purified. The gel purified plasmid DNA was 
35 circularized by ligation and used to transform E. coli 
HB101 . Positive colonies were identified by restriction 
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analysis and designated pDHFR' . DNA from positive 

colonies was isolated and used to transform dam" E , coli . 

Plasmid pD2 ' was generated by cleaving pDHFR 1 
and pSV4 0 (comprising Bam HI digested SV4 0 DNA cloned into 
5 the Bam HI site of pML-1 (Lusky et al . , Nature 2£2: 79-81, 
1981)) with Bel I and Bam HI. The DNA fragments were 
resolved by gel electrophoresis, and the 4.9 kb pDHFR ' 
fragment and 0.2 kb SV4 0 fragment were isolated. These 
fragments were used in a ligation reaction, and the 
10 resulting plasmid, designated pD2 1 , was used to transform 
E , coli RRI . 

Plasmid pD2 1 was modified by deleting the 
"poison" sequences in the pBR322 region (Lusky et al . , 
1981, ibid.). Plasmids pD2 1 and pML-1 were digested with 

15 Eco RI and Nru I. The 1.7 kb pD2 1 fragment and 1.8 kb 
pML-1 fragment were isolated by gel purification, 
circularized in a ligation reaction and used to transform 
E. coli HB101. Positive colonies were identified using 
restriction analysis (designated pD2) and digested with 

20 Eco RI and Bel I. A 2 . 8 kb fragment (fragment C) was 
isolated and gel purified. 

To generate the remaining fragments used in 
constructing pD3 , pDHFRIII was modified to convert the Sac 
II (Sst II) site into either a Hind III or Kpn I site. 

25 pDHFRIII was digested with Sst II and ligation reactions 
with either Hind III or Kpn I linkers were done. The 
resultant plasmids were digested with either Hind III or 
Kpn I and gel purified. The resultant plasmids were 
designated either pDHFRIII (Hind III) or pDHFRIII (Kpn I). 

3 0 A 700 bp KpnI-Bgl II fragment (fragment A) was purified 
from pDHFRIII (Hind III) . 

The SV4 0 enhancer sequence was inserted into 
pDHFRIII (Hind III) by first digesting SV40 DNA with Hind 
III, and DNA from 5089 to 968 bp was isolated and 

35 purified. Plasmid pDHFRIII (Hind III) was. phosphatased, 
and the SV4 0 DNA and linearized plasmid pDHFRIII (Hind 
III) were used in a ligation reaction. A 700 bp Eco RI- 



WO 97/20043 



24 



l»CT/US96/18866 



10 



Kpn I fragment (fragment B) was isolated from the 
resulting plasmid. 

For the final construction of pD3 , fragments A 
(50 ng) , B (50 ng) and C (10 ng) were combined in a 
ligation reaction and used to transform E. nol -i rri 
Positive colonies were isolated and plasmid DNA was 
prepared. 

Plasmid pD3 was modified to accept the insertion 
of the protein C sequence by converting the Bel I 
insertion site to an Eco RI site. First, the Eco RI site 
present in pD3 (the leftmost terminus in adenovirus 5 0-1) 
was converted to a Bam HI site via conventional linkering 
procedures. The resultant plasmid was transformed in E_ 
CQii HB101. Plasmid DNA was prepared, and positive clones 
15 were identified by restriction analysis. 

pD3' is a vector identical to pD3 except that 
the SV40 polyadenylation signal (i.e., the £V40 Bam HI 
(2533 bp) to Bel I (2770 bp) fragment) is in the late 
orientation. Thus, pD3 ■ contains a Bam HI site as the 
site of gene insertion. 

To generate pDX, the Eco RI site in pD3 ' was 
converted to a Bel I site by Eco RI cleavage, incubation 
with SI nuclease and subsequent ligation with Bel I 
linkers. DNA was prepared from a positively identified 
colony, and a 1.9 kb Xho I-Pst I fragment containing the 
altered restriction site was prepared via gel 
purification. In a second modification, Bel I-cleaved pD3 
was ligated with Eco Ri-Bcl I adapters in order to 
generate an Eco RI site as the position for inserting a 
gene into the expression vector. Positive colonies were 
identified by restriction analysis. The resulting 

plasmid, designated pDX, has a unique Eco RI site for 
insertion of foreign genes. 

The protein C cDNA was inserted into pDX as an 
35 Eco RI fragment. Plasmids were screened by restriction 
analysis. A plasmid having the protein C insert in the 
correct orientation with respect to the promoter elements 



20 



25 



30 



WO 97/20043 



25 



PCT/US96/18866 



and plasmid DNA was designated pDX/PC. Because the cDNA 
insert in pDX/PC contains a ATG codon in the 5' non-coding 
region, deletion mutagenesis was performed on the cDNA. 
Deletion of the three base pairs was performed according 
5 to standard procedures or oligonucleotide -directed 
mutagenesis. The pDX-based vector containing the modified 
cDNA was designated p594. 

C. Modification of the Protein C Processing Site 
10 To enhance the processing of single-chain 

protein C to the two-chain form, two additional arginine 
residues were introduced immediately upstream of the 
L y s i56" Ar 9i57 cleavage site of the precursor protein, 
resulting in a cleavage site consisting of four basic 
15 amino acids, Arg- Arg-Lys-Arg (SEQ ID NO: 20). The 
resultant mutant precursor of protein C was designated 
PC962. It contains the sequence Ser-His-L.eu-Arg-Arg-L.ys- 
Arg-Asp (SEQ ID NO: 22) at the cleavage site. Processing 
at the Arg- Asp bond results in a two- chain protein C 
20 molecule . 

The mutant molecule was generated by altering 
the cloned cDNA by site-specific mutagenesis (essentially 
as described by Zoller and Smith, DNA 3.: 479-488, 1984, for 
the two-primer method) using the mutagenic oligonucleotide 

25 ZC962 ( 5 ' AGTCACCTGAGAAGAAAACGAGACA 3 ' ; SEQ ID NO: 11). 
Plasmid p594 was digested with Sst I, and the 
approximately 87 bp fragment was cloned into M13mpll and 
single-stranded template DNA was isolated. Following 
mutagenesis, a correct clone was identified by sequencing. 

30 Replicative form DNA was isolated, digested with Sst I, 
and the protein C fragment was inserted into Sst I -cut 
p594. Clones having the Sst I fragment inserted in the 
desired orientation were identified by restriction enzyme 
mapping. The resulting expression vector was designated 

3 5 pDX/PC96 2. 
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D - Intronless Prn^ein r r onsfrnrl- 

To facilitate the cloning of the proiein C cDNA, 
PC962, into pMAD, the cDNA contained in pEX/PC962 was 
modified to incorporate Eco RV sites at the extremities of 
5 the protein C cDNA insert. A 769 bp Sst Il-Pst I fragment 
encompassing the 3> end of PC962 was cloned between the 
Sst II and Pst I sites of pBluescript II SK® (Stratagene, 
La Jolla, CA) . The fragment was excised with Sst II and 
Eco RV and purified. The 5' portion of PC962 was modified 
10 by PCR. The sense oligonucleotide primer for this 
reaction covered the 5 1 ATG region of the cDNA and 
provided an Eco RV site upstream of this in the product. 
The antisense oligonucleotide primer covered the Sst II 
site used to generate the Sst 1 1 -Eco RV fragment. The 
15 resulting PCR product was digested with Eco RV and Sst II 
and ligated with the Sst II-Eco RV 3' fragment and Eco RV 
digested pMAD. The resulting plasmid, designated pCORP9 
effectively contained the PC962 cDNA flanked by Eco RV 
sites in an intronless fusion driven by the beta- 
20 lactoglobulin promoter. 

E- Genomic Protean C DMA Pnr. R tr»rrinn 

A genomic DNA construct containing exons I 
through VIII was made. See, U.S. Patent 4,955,318, which 
is incorporated herein by reference, for disclosure of the 
25 exon structure of the protein C gene. This genomic 
construct, designated GPC10-1, changed the sequence 16 
base pairs upstream of the ATG from the native protein C 
sequence to the beta- lactoglobulin sequence and introduced 
mutations in the propeptide cleavage site located in exon 
3 0 2, and the two-chain cleavage site located in exon 6, as 
described below. 

The construct was assembled using four fragments 
designated A, B, C and D and encompassed the protein C 
gene sequence from the ATG to a Bam HI site in exon VIII, 
35 immediately upstream of the stop codon . The fragments 
were generated from a human genomic library in a. Charon 4 A 
phage that was screened with a radiolabeled cDNA probe for 



WO 97/20043 



27 



PCT/US96/18866 



human protein C. The screening of the X library produced 
three clones that together mapped the entire protein C 
gene (Foster et al . , 1985, ibid.). These clones were 
designated PCXl, PCXG and PCA.8 . 
5 Fragment A was a Not I to Eco RI fragment that 

contained exons I and II of the genomic sequence and was 
16 98 bp. A subclone of PCA.6 contained an Eco RI to Eco RI 
fragment and was designated pHCR4.4-l. Using pHCR4.4-l as 
a template and oligonucleotides ZC6303 (SEQ ID NO: 12) and 

10 ZC6337 (SEQ ID NO: 13) , a DNA fragment was generated by 
polymerase chain reaction (PCR) . Oligonucleotide ZC6303 
changed the sequence 16 base pairs 5 1 to the ATG sequence 
from the native protein C sequence to the equivalent 
sequence from the beta- lactoglobulin gene and introduced a 

15 Not I site. Oligonucleotide ZC6337 changed the propeptide 
cleavage site from Arg- Ile-Arg-Lys-Arg (SEQ ID NO: 24) to 
Gln-Arg-Arg-Lys-Arg (SEQ ID NO: 25) . The resulting PCR- 
generated fragment was digested with Not I and Bss HII, 
and a 1402 base pair fragment was gel purified and 

20 designated Al . A second fragment was prepared using a X 
gtll clone of PCXl as a template with oligonucleotides 
ZC6306 (SEQ ID NO: 14) and ZC6338 <SEQ ID NO: 15) in a 
polymerase chain reaction. The resulting DNA fragment, 
designated A3, was digested with Bss HII and Eco RI and 

25 gel purified, resulting in a 296 base pair fragment. 
Fragments Al and A3 were ligated into the Bluescript II KS 
® phagemid vector (Stratagene, La Jolla, CA) . The 
resulting plasmid, designated GPC 2-2, was digested with 
Not I and Eco RI, gel purified and the Not I -Eco RI DNA 

3 0 fragment was designated Fragment A. 

pCR 2-14 is a subclone that contains an Eco RI 
to Eco RI DNA fragment of PCX8 (Foster et al . , 1985, 
ibid.). The plasmid was digested with Eco RI and Sst I 
and gel purified. The resulting fragment was designated 

3 5 Fragment B. 

Plasmid pCR 2-14 was used as template DNA with 
oligonucleotides 2C6373 (SEQ ID NO: 16) and ZC6305 (SEQ ID 
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NO: 17), which introduced an Afl II site and the RRKR 
mutation of the native (KR) two-chain cleavage site, in a 
polymerase chain reaction. The resulting PCR-generated 
fragment was digested with Bgl II and Afl II and gel 
5 purified, resulting in a 1441 base pair fragment, 
designated El. Fragment El was used in a ligation 
reaction with oligonucleotides ZC6302 (SEQ ID NO: 18) and 
ZC6304 (SEQ ID NO: 19) . These oligonucleotides form Afl 
II and Sst II restriction sites when annealed and were 
10 ligated to the 3* end of fragment El, resulting in a 
fragment with a 5' Bgl II site and a 3' Sst II site. This 
fragment was used in a ligation reaction with a Bam HI -Sst 
II digested Bluescript II KS® phagenid vector 
(Stratagene) . The resulting plasmid was designated GPC 8- 
15 5 and digested with Sst I and Sst II, generating a 626 
base pair fragment, designated Fragment C. 

A fourth fragment was generated by digestion of 
a genomic subclone (pHCB7-l) of PCX8 . pHCB7-i contained a 
Bgl -II to Bgl II fragment that encompassed exons VI 
20 through VIII. pHCB7-l was digested with Sst I" and Bam HI 
and a 2702 base pair fragment was gel purified. The 
fragment was designated Fragment D. 

A five-part ligation reaction was prepared using 
Not I and Bam HI digested and linearized Bluescript II KS® 
25 phagemid vector (Stratagene) with Fragment A (5- Not I to 
3' Eco RI) that contained exons I and II, Fragment B (5* 
Eco RI to 3' Sst I) that contained exons III. IV and V, 
Fragment C (5' Sst I to 3' Sst II) that contained the 5' 
portion of exon VI and Fragment D (5' Sst II to 3' Bam HI) 
that contained the remaining 3' portion of exon VI and 
exons VII and VIII. The resulting DNA was 8950 base pairs 
and designated GPC 10-1. 

GPC10-1 was originally generated with BLG 
sequences and a Not I site upstream of the ATG initiator 
35 codon and modifications to both cleavage sites. A clone, 
designated pPC12/BS, was generated to ensure that the 5' 
Not I site of GPC10-1 would not introduce secondary 
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structure into mRNA molecules that could hinder 
translation. pPC12/BS was generated using PCR 

amplification of a 1 kb Not I -Sea I fragment that covered 
the 5 1 region of the protein C gene and contained the 
5 wild-type ATG codon environment. This introduced an Eco 
RV site immediately downstream of the Not I site, adjacent 
to the ATG codon, and a Bam HI site was incorporated 3' of 
the Sea I site to facilitate cloning. Following a Not 
I /Bam HI digestion, the PCR product was cloned into Not I- 

10 Bam HI digested Bluescript II KS® phagemid vector 
(Stratagene) . The Not I -Eco RV-Sca I fragment present in 
pPC12/BS was excised, purified and ligated to GPC10-1, 
which had been linearized with Not I and partially 
digested with Sea I (the pUC ampillicin gene has an 

15 internal Sea I site) . The resulting clone was designated 
GPC10-2 and possesses an Eco RV site immediately upstream 
of the ATG initiator codon. 

GPC10-1 and GPC10-2 both terminated at the final 
Bam HI site in exon VIII of the protein C gene. To 

20 reconstitute the 56 bp of sequence, ending at the 
termination codon, two oligonucleotides were synthesized 
with flanking Bam HI (5') and Bgl II (3 1 ) restriction 
sites. Following annealing of the oligonucleotides, the 
product was cloned into Bam HI digested pBST+ to generate 

25 plasmid pPC3 1 . pBST+ is a derivative of pBS (Stratagene) 
with a new polylinker. The addition of the polyl inker 
added Bgl II, Xho I, Nar I and Cla I restriction sites 
from the vector polylinker downstream of the destroyed Bgl 
II site of the oligonucleotide construct. 

3 0 The Not I -Bam HI fragment of GPC10-1 was 

subcloned into Not I/Bam HI digested pPC3 1 to add 3' 
coding sequences of protein C, the TAG termination codon 
followed by Bgl II-Xho I-Nar I-Cla I. The 3' region of 
the protein C gene . beginning with the Eco RV site in 

35 intron V was excised from this plasmid on an Eco RV-Cla I 
fragment . 
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The Eco RV-Eco RV fragment from GPC10-2, 
covering the 5' portion of the protein C gene, and the 
above Eco Rl-Cla I fragment covering the 3- portion of the 
protein C gene were combined between the Eco RV and Cla I 
5 sites of P MAD6 (SEQ ID NO: 23) to generate pCORP13 . This 
effectively placed a genomic portion of the prctein C gene 
with modified propeptide and two-chain clectvage sites 
under the control of the beta-lactoglobulin promoter. 

A further genomic construct was generated from 
10 pCORP13 that contained only the modified two-chain 
cleavage site. This was achieved using PCR amplification 
to modify two fragments which resulting in restoration of 
the coding capability of exon 2 from the mutant Gln-Arg- 
Arg-Lys-Arg (SEQ ID NO: 25) to the wild- type Arg-Ile-Arg- 
15 Lys-Arg (SEQ ID NO: 24). pCORP13 was used as template for 
these reactions. The first fragment was 1.3 kb, which 
encompassed the 5' end of the protein C gene up to the Bam 
HI site in exon 2. For this reason, the sense primer was 
designed to add a Hind III site 5- to the Eco RV site 
20 proximal to the ATG initiation codon. The antisense 
primer was designed to restore the wild- type sequences in 
exon 2, which included a restored Bam HI site. A second 
fragment of 0.2 kb from the Bam HI site in exon 2 to the 
Xho I site in intron 2, was also amplified. The two 
25 fragments were combined in pGEMII (Promega, Madison, WI) 
to generate pGEMPCl . 5 . A 7.5 kb Xho I fragment from pCORP 
13 was ligated to Xho I digested pGEMPCl . 5 to generate a 
complete protein C genomic sequence covering exons 1-8 
with a wild-type propeptide cleavage site and a modified 
30 two-chain cleavage site. The plasmid was designated 
PGEMPC14. The sequence was excised from pGEMPC14 as a 
Hind Ill/Sal I fragment. The DNA termini were repaired 
using a Klenow reaction and the fragment was blunt -end 
ligated into Eco RV digested pMAD6 (SEQ ID MO: 23) to 
35 generate pCORPi4 . 
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Example 3 

Mice for initial breeding stocks (C57BL6J, 
CBACA) were obtained from Harlan Olac Ltd. (Bicester, UK) . 
These were mated in pairs to produce Fl hybrid cross 
5 (B6CBAF1) for recipient females, superovulated females, 
stud males and vasectomized males. All animals were kept 
on a 14 hour light/10 hour dark cycle and fed water and 
food (Special Diet Services RM3 , Edinburgh, Scotland) ad 
liJbi turn. 

10 Transgenic mice were generated essentially as 

described in Hogan et al . , Manipulating the Mouse Embryo; 
A Laboratory Manual . Cold Spring Harbor Laboratory, 1986, 
which is incorporated herein by reference in its entirety. 
Female B6CBAF1 animals were superovulated at 4-5 weeks of 

15 age by an i.p. injection of pregnant mares 1 serum 
gonadotrophin (FOLLIGON, Vet -Drug, Falkirk, Scotland) (5 
iu) followed by an i.p. injection of human chorionic 
gonadotrophin (CHORULON, Vet-Drug, Falkirk, Scotland) (5 
iu) 45 hours later. They were then mated with a stud male 

20 overnight. Such females were next examined for copulation 
plugs. Those that had mated were sacrificed, and their 
eggs were collected for microinjection. 

DNA was injected into the fertilized eggs as 
described in Hogan et al . (ibid.). Briefly, the vector 

25 containing the protein C expression unit was digested with 
Mlu I, and the expression unit was isolated by sucrose 
gradient centrif ugation . All chemicals used were reagent 
grade (Sigma Chemical Co., St. Louis, MO, U.S.A.) , and all 
solutions were sterile and nuclease-f ree . Solutions of 

30 20% and 4 0% sucrose in 1 M NaCl , 20 mM Tris pH 8 . 0 , 5 mM 
EDTA were prepared using UHP water and filter sterilized. 
A 3 0% sucrose solution was prepared by mixing equal 
volumes of the 20% and 4 0% solutions. A gradient was 
prepared by layering 0.5 ml steps of the 40%, 30% and 20% 

3 5 sucrose solutions into a 2 ml polyallomer tube and allowed 
to stand for one hour. 100 ^il of DNA solution (max. 8 p,g 
DNA) was loaded onto the top of the gradient , and the 
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gradient was centrifuged for 17-20 hours at 26,000 rpm, 
15°C in a Beckman TL100 ultracentrif uge usir.g a TLS-55 
rotor (Beckman Instruments, Fullerton, CA, USA) . 
Gradients were fractionated by puncturing the tube bottom 
5 with a 20 ga. needle and collecting drops in a 96 well 
microtiter plate. 3 nl aliquots were analyzed on a 1% 
agarose mini-gel. Fractions containing the protein C DNA 
fragment were pooled and ethanol precipitated overnight at 
-20°C in 0.3M sodium acetate. DNA pellets were resuspended 

10 in 50-100 nl UHP water and quantitated by iluorimetry . 
The protein C expression unit was diluted in Dulbecco's 
phosphate buffered saline without calcium and magnesium 
(containing, per liter, 0.2 g KC1 , 0.2 g KH; P0 4 , 8.0 g 
NaCl, 1.15 g Na 2 HP0 4 ) or in TE (10 mM Tris-HCl, 1 mM EDTA 

15 pH 7.5). DNA concentration is adjusted to about 6 ug/ml, 
prior to injection into the eggs ("2 pi total ENA solution 
per egg) . 

Recipient females of 6-8 weeks of age are 
prepared by mating B6CBAF1 females in natural estrus with 

20 vasectomized males. Females possessing copulation plugs 
are then kept for transfer of microinjected eggs. 

Following birth of potential transgenic animals, 
tail biopsies are taken, under anesthesia, at four weeks 
of age. Tissue samples are placed in 2 ml of tail buffer 

25 (0.3 M Na acetate, 50 mM NaCl, 1.5 mM MgCl2, LO mM Tris- 
HCl, pH 8.5, 0.5% NP4 0, 0.5% Tween 20) containing 200 
fig/ml proteinase K (Boehringer Mannheim, Mannheim, 
Germany) and vortexed. The samples are shaken (250 rpm) 
at 55°-60°c for 3 hours to overnight. DNA prepared from 

30 biopsy samples is examined for the presence of the 
injected constructs by PCR and Southern blotting. The 
digested tissue is vigorously vortexed, and 5 fil aliquots 
are placed in 0.5 ml microcentrifuge tubes. Positive and 
negative tail samples are included as controls. Forty Ml 

35 of silicone oil (BDH, Poole, UK) is added to each tube, 
and the tubes are briefly centrifuged. The tubes are 
incubated in the heating block of a thermal cycler (e.g. 
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Omni -gene, Hybaid, Teddington, UK) to 95°C for 10 minutes. 
Following this, each tube has a 45 11 aliquot of PCR mix 
added such that the final composition of each reaction mix 
is: 50 mM KC1; 2 mM MgCl 2 ; 10 mM Tris-HCl (pH 8.3); 0.01% 
5 gelatin; 0.1% NP40, 10% DMSO; 500 nM each primer, 200 1M 
dNTPs ; 0.02 U/ll Taq polymerase (Boehringer Mannheim, 
Mannheim, Germany) . The tubes are then cycled through 3 0 
repeated temperature changes as required by the particular 
primers used. The primers may be varied but in all cases 

10 must target the BLG promoter region. This is specific for 
the injected DNA fragments because the mouse does not have 
a BLG gene. Twelve 11 of 5x loading buffer containing 
Orange G marker dye (0.25% Orange G (Sigma) 15% Ficoll 
type 400 (Pharmacia Biosystems Ltd., Milton Keynes, UK) ) 

15 is then added to each tube, and the reaction mixtures are 
electrophoresed on a 1.6% agarose gel containing ethidium 
bromide (Sigma) until the marker dye has migrated 2/3 of 
the length of the gel. The gel is visualized with a UV 
light source emitting a wavelength of 254 nm. Transgenic 

20 mice having one or more of the injected DNA fragments are 
identified by this approach. 

Positive tail samples are processed to obtain 
pure DNA. The DNA samples are screened by Southern 
blotting using a BLG promoter probe (nucleotides 2523-4253 

25 of SEQ ID NO: 7) . 

Southern blot analysis of transgenic mice 
prepared essentially as described above demonstrated that 
approximately 10% of progeny contained protein C 
sequences. Examination of milk from positive animals by 

30 reducing SDS polyacrylamide gel electrophoresis 
demonstrated the presence of protein C at concentrations 
up to 1 mg/ml . 

Example 4 

3 5 Donor ewes are treated with an intravaginal 

progesterone -impregnated sponge (CHRONOGEST Goat Sponge, 
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Intervet, Cambridge, UK) on day 0. Sponges are left in 
situ for ten or twelve days. 

Superovulation is induced by treatment of donor 
ewes with a total of one unit of ovine follicle 
5 stimulating hormone (OFSH) (OVAGEN, Horizon Animal 
Reproduction Technology Pty. Ltd., New Zealand) 
administered in eight intramuscular injections of 0.125 
units per injection starting at 5:00 pro on day -4 and 
ending at 8:00 am on day o. Donors are injected 

10 intramuscularly with 0.5 ml of a luteol.ytic agent 
(ESTRUMATE, Vet -Drug) on day -4 to cause regression of the 
corpus luteum, to allow return to estrus and ovulation. 
To synchronize ovulation, the donor animals are injected 
intramuscularly with 2 ml of a synthetic releasing hormone 

15 analog (RECEPTAL, Vet-Drug) at 5:00 pm on day 0. 

Donors are starved of food and wa.ter for at 
least 12 hours before artificial insemination (A.I.). The 
animals are artificially inseminated by intrauterine 
laparoscopy under sedation and local anesthesia on day l. 

20 Either xylazine (ROMPUN, Vet-Drug) at a dose rate of 0.05- 

0. 1 ml per 10 kg bodyweight or ACP injection 10 mg/ml 
(Vet-Drug) at a dose rate of 0.1 ml per 10 kg bodyweight 
is injected intramuscularly approximately fifteen minutes 
before A.I. to provide sedation. A.I. is carried out 

25 using freshly collected semen from a Poll Dorset ram. 
Semen is diluted with equal parts of filtered phosphate 
buffered saline, and 0.2 ml of the diluted semen is 
injected per uterine horn. Immediately pre- or post-A.I., 
donors are given an intramuscular injection of AMOXYPEN 

30 (Vet-Drug) . 

Fertilized eggs are recovered on day 2 following 
starvation of donors of food and water from 5:00 pm on day 

1. Recovery is carried out under general anesthesia 
induced by an intravenous injection of 5% thiopentone 

35 sodium (INTRAVAL SODIUM, Vet -Drug) at a dose rate of 3 ml 
per 10 kg bodyweight. Anesthesia is maintained by 
inhalation of 1-2% Halothane/0 2 /N 2 0 . To rrecover the 
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fertilized eggs, a laparotomy incision is made, and the 
uterus is exteriorized. The eggs are recovered by 

retrograde flushing of the oviducts with Ovum Culture 
Medium (Advanced Protein Products, Brierly Hill, West 
5 Midlands, UK) supplemented with bovine serum albumin of 
New Zealand origin. After flushing, the uterus is 

returned to the abdomen, and the incision is closed. 
Donors are allowed to recover post -operat ively or are 
euthanized. Donors that are allowed to recover are given 

10 an intramuscular injection of Amoxypen L..A. at the 
manufacturer's recommended dose rate immediately pre- or 
post -operat ively . 

Plasmids containing the protein C DNA are 
digested with Mlu I, and the expression unit fragments are 

15 recovered and purified on sucrose density gradients. The 
fragment concentrations are determined by fluorimetry and 
diluted in Dulbecco's phosphate buffered saline without 
calcium and magnesium or TE as described above. The 
concentration is adjusted to 6 Ig/ml, and approximately 2 

20 pi of the mixture is microin jected into one pronucleus of 
each fertilized eggs with visible pronuclei. 

All fertilized eggs surviving pronuclear 
microinjection are cultured in vitro at 38.5°C in an 
atmosphere of 5% C0 2 :5% 0 2 : 90% N 2 and about _100% humidity 

25 in a bicarbonate buffered synthetic oviduct medium (see 
Table) supplemented with 20% v/v vasectomized ram serum. 
The serum may be heat inactivated at 56°C for 3 0 minutes 
and stored frozen at -20°C prior to use. The fertilized 
eggs are cultured for a suitable period of time to allow 

3 0 early embryo mortality (caused by the manipulation 
techniques) to occur. These dead or arrested embryos are 
discarded. Embryos having developed to 5 or 6 cell 
divisions are transferred to synchronized recipient ewes. 
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Tab] ft 

Synthetic Oviduct Medium 



10 



15 



20 



25 



30 



Stock A llasrs 3 Mnnrhg) 

NaCl 

KC1 

KH 2 P0 4 
MgS0 4 . 7H 2 0 
Penicillin 

Sodium Lactate 60% syrup 
Super H 2 0 

Stock R (Lasrs ? wppIcr) 
NaHC0 3 
Phenol red 
Super H 2 0 

Stock C (T.asl-g 7. wPPlrsi 

Sodium Pyruvate 
Super H 2 0 

StQCk P (Lasts 3 months) 

CaCl2.2H 2 0 

Super H 2 0 

Stock F. (T.a«fa 3 mnnrhs) 

Hepes 

Phenol red 
Super H 2 0 



6.29 g 
0.534 g 
0 . 162 g 
0.182 g 
0.06 g 
0.6 mis 
99.4 mis 



0.21 g 
0.001 g 
10 mis 



0.051 g 
10 mis 



0.262 j 
10 mis 



0 . 651 
0.001 
10 mis 



'3 
'3 



35 



40 



45 



50 



TO make UP IQmls of Ricarhnn^tP B u ffered 

Medium 

STOCK A i m i 

STOCK B 1 ml 

STOCK C 0.07 ml 

STOCK D 0.1 ml 

Super H 2 0 7 . 83 mL 

Osmolarity should be 265-285 mOsm. 

Add 2.5 ml of heat inactivated sheep serum 

and filter sterilize. 

™? k ? " p 10 ™ 1s of HRPF " q BulfJLceJ M^inm 

STOCK A ! ml 

STOCK B 0.2 ml 

STOCK C 0.0 7 ml 

STOCK D 0.1 ml 

STOCK E 0.8 ml 

Super H20 7. 8 3 mL 
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Table, cont ■ 

Osmolarity should be 265-285 mOsm. 
5 Add 2.5 ml of heat inactivated sheep serum 

and filter sterilize. 

Recipient ewes are treated with an intravaginal 
progesterone -impregnated sponge (Chronogest Ewe Sponge or 

10 Chronogest Ewe -Lamb Sponge, Intervet) left in situ for 10 
or 12 days. The ewes are injected intramuscularly with 
1.5 ml (300 iu) of a follicle stimulating hormone 
substitute (P.M.S.G., Intervet) and with 0.5 ml of a 
luteolytic agent (Estrumate, Coopers Pitman-Moore) at 

15 sponge removal on day -1. The ewes are tested for estrus 
with a vasectomized ram between 8:00 am and 5:00 pm on 
days 0 and 1 . 

Embryos surviving in vitro culture are returned 
to recipients (starved from 5:00 pm on day 5 or 6) on day 

20 6 or 7. Embryo transfer is carried out under general 
anesthesia as described above. The uterus is exteriorized 
via a laparotomy incision with or without laparoscopy. 
Embryos are returned to one or both uterine horns only in 
ewes with at least one suitable corpora lutea. After 

25 replacement of the uterus, the abdomen is closed, and the 
recipients are allowed to recover. The animals are given 
an intramuscular injection of Amoxypen L.A. at the 
manufacturer's recommended dose rate immediately pre- or 
post-operatively . 

30 Lambs are identified by ear tags and left with 

their dams for rearing. Ewes and lambs are either housed 
and fed complete diet concentrates and other supplements 
and or ad lib. hay, or are let out to grass. 

Within the first week of life (or as soon 

35 thereafter as possible without prejudicing health) , each 
lamb is tested for the presence of the heterologous DNA by 
two sampling procedures. Following tail biopsy, within a 
week, a 10 ml blood sample is taken from the jugular vein 
into an EDTA vacutainer . Tissue samples are taken by tail 
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15 



biopsy as soon as possible after the tail has become 
desensitized after the application of a rubber elastrator 
ring to its proximal third (usually within 200 minutes 
after "tailing"). The tissue is placed immediately in a 
5 solution of tail buffer. Tail samples are kept at room 
temperature and analyzed on the day of collection. All 
lambs are given an intramuscular injection of Amoxypen 
L.A. at the manufacturer's recommended dose rate 
immediately post -biopsy, and the cut end of ;he tail is 
10 sprayed with an antibiotic spray. 

DNA is extracted from sheep blood by first 
separating white blood cells. A 10 ml sample of blood is 
diluted in 20 ml of Hank's buffered saline (HE.S; obtained 
from Sigma Chemical Co.). Ten ml of the diluted blood is 
layered over 5 ml of Histopaque (Sigma) in each of two 15 
ml screw-capped tubes. The tubes are centrifuged at 3000 
rpm (2000 x g max.), low brake for 15 minutes at room 
temperature. White cell interfaces are removed to a clean 
15 ml tube and diluted to 15 ml in HBS . The diluted cells 
are spun at 3000 rpm for 10 minutes at room temperature, 
and the cell pellet is recovered and resuspended in 2-5 ml 
of tail buffer. 

To extract DNA from the white cells, 10% SDS is 
added to the resuspended cells to a final concentration of 

25 1%, and the tube is inverted to mix the solution. One mg 
of fresh proteinase K solution is added, and the mixture 
is incubated overnight at 4 5°C. DNA is extracted using an 
equal volume of phenol/chloroform (x3) and 

chloroform/isoamyl alcohol (xl) . The DNA is then 

3 0 precipitated by adding 0.1 volume of 3 M NaOAc and 2 
volumes of ethanol, and the tube is inverted to mix. The 
precipitated DNA is spooled out using a clean glass rod 
with a sealed end. The spool is washed in 73% ethanol, 
and the DNA is allowed to partially dry, then 

35 redissolved in TE (10 mM Tris-HCl, 1 mM EDTA, pll 7.5). 



20 
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DNA samples from blood and tail are analyzed by 
Southern blotting using probes for the BLG promoter region 
and the protein C coding regions . 

5 Example 5 

A founder female animal, designated 30851, which 
is transgenic for both BLG and pCORP9 was generated. She 
has given rise to two sons and a transgenic daughter, 
designated 40387. Recombinant transgenic protein C was 

10 purified from milk (from 30851) by a single chromatography 
step using a calcium-dependent monoclonal antibody 
affinity column. Briefly, the milk samples were pooled up 
to a volume of 40 ml. Two volumes of ice-cold 1 X TBS (50 
mM Tris-HCl, 150 mM NaCl pH 6.5) and 200 mM EDTA, pH 6.5 

15 were added to solubilise the caseins. The EDTA- treated 
milk solution was centrifuged at 15,000 rpm for 30 minutes 
at 4°C in a JA20 rotor (Beckman Instruments, Irvine, CA) . 
After centrif ugation, the upper lipid phase and the small 
pellet were discarded. 

20 The EDTA- treated milk was diluted with an equal 

volume of ice-cold 1 X TBS and 13 3 mM CaCl 2 while 
stirring. A cloudy precipitate formed upon addition of 
the CaCl2- The pH was quickly adjusted by addition of a 
few drops of 4 M NaOH, and the precipitate was 

25 redissolved. Any remaining insoluble material was removed 
by filtration through a 0.45 fim filter. 

The optical density of the solubilised milk was 
measured at 280 nm, and the protein concentration was 
calculated. The milk was diluted to a protein 

30 concentration of 10 mg/ml using 1 X TBS containing CaCl2 
to give a final Ca ++ concentration of 25 mM. The milk was 
used to resuspend antibody- Sepharose that carried the 
immobilized Ca ++ - dependent monoclonal antibody PCL-2, and 
had been washed in 1 X TBS and 25 mM CaCl 2 ' PCL-2 is a 

35 monoclonal antibody that binds single chain and two chain 
protein C, whether or not they are gamma - carboxylated . 
The milk- Sepharose mixture was incubated overnight at 4°C. 



WO 97/20043 



40 



PCT/US96/18866 



The matrix was washed twice in batch with 1 x 
TBS and 25 mM CaCl 2 and packed into a glass column. The 
resin was washed at a flow rate of 1 ml/min with a calcium 
containing buffer and a stable baseline was achieved 
5 before the bound protein was eluted with an isocratic 
elution using 1 X TBS and 25 mM EDTA, pH 6.5. Fractions 
containing protein C were pooled and concentrated to 
approximately 1 ml using an Amicon ultrafiltration unit 
with a 10 kDa cut-off membrane (Amicon, Danvers, MA) . 

10 The monoclonal antibody, PCL-2, was coupled to 

the activated Sepharose 4B as follows: 1 g (3.5 ml of gel) 
of cyanogen bromide activated Sepharose 4B (Pharmacia LKB 
Biotechnology, Piscataway, NJ) was swollen fo:r 15 minutes 
in l mM HC1 . The swollen gel was resuspended in 0.1 M 

15 NaHC0 3 , 0.5 M NaCl pH 8.3 and washed several times. The 
washed gel was resuspended in ll ml of monoclonal antibody 
solution (PCL-2, 3.5 mg/ml in bicarbonate buffer pH 8.3) 
with a coupling ratio of approximately 10 mg/ml gel. 
Coupling was allowed to proceed for 2 h at room 

20 temperature on a rotary mixer, and the gel was recovered 
by gentle centrif ugation . The monoclonal supernatant was 
removed and replaced by 1 M ethanolamine in order to block 
any remaining sites on the Sepharose. Blocking was 
performed overnight at 4°C. Excess adsorbed protein was 

25 removed by sequential acid and alkali washes (0.1 M 
acetate, 0.5 M NaCl pH 4.0; 0.1 M NaHC0 3 , 0.5 M NaCl pH 
8.3), and the coupled gel was stored in 50 mM Tris-HCl, 
150 mM NaCl pH 6.5, 0.02% azide. 

30 ExattmlP K 

Samples of purified recombinant transgenic 
protein C were compared with plasma-derived prrotein C and 
a plasma -derived activated protein C (APC) preparations. 
Samples were run on SDS PAGE 4-20% acrylamide gradient 
35 gels under reducing conditions and silver stained for 
protein. 
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The plasma-derived material shows the presence 
of a heavy-chain doublet around 44 kDa (Figure 1, Lane 1) . 
This has been reported to be due to partial occupancy of 
the three possible N- linked glycosylation sites on the 
5 molecule. A similar doublet, although of a slightly lower 
mass presumably due to some subtle change in glycosylation 
profile, has also been seen with the transgenic protein C. 
The light chain was visible around 22 kDa for both 
preparations. Significantly, in the case of the plasma- 

10 derived material uncleaved single-chain was clearly 
visible above the heavy chain doublet. Plasma-derived 
protein normally contained 5-10 percent of this inactive 
material. In contrast, the transgenic protein C contains 
no obvious single chain by this gel analysis. Therefore, 

15 it contains less than a few percent at most of inactive 
material. This most likely reflects the increased 

efficiency of cleavage of the modified inter-chain site. 
In further support of this observation no single chain was 
visible by direct western blot analysis of transgenic 

20 sheep milk (40387, expression level 300 ng/ml) . 

The purified transgenic protein C was further 
characterized as follows: 
A. EL12A 

An enzyme- linked immunosorbent assay (EL ISA) for 
25 protein C was done as follows: Affinity-purified 
polyclonal antibody to human protein C (100 jil of 1 ^ig/ml 
in 0.1 M Na2CC>3, pH 9.6) was added to each well of a 96- 
well microtiter plate, and the plates were incubated 
overnight at 4°C. The wells were then washed three times 
30 with phosphate buffered saline (PBS) containing 0.05% 
Tween-20 and incubated with 100 |il of 1% bovine serum 
albumin (BSA) , 0.05% Tween-20 in PBS at 4°C overnight. The 
plates were then rinsed several times with PBS, air dried 
and stored at 4°C. To assay samples, 100 ^il of each sample 
35 was incubated for 1 h at 37°C with a biot in-con jugated 
sheep polyclonal antibody to protein C (30 ng/ml) in PBS 
containing 1% BSA and 0.05% Tween-20. After incubation, 
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the wells were rinsed with PBS, and alkaline phosphatase 
activity was measured by the addition of 100 ul of 
phosphatase substrate (Sigma, St. Louis, MO) in 10% 
diethanolamine, pH 9.8, containing 0.3 mM MgCl 2 . The 
5 absorbance at 405 nm was read on a microliter plate 
reader. Quantitation was by comparison with a standard 
curve using plasma-derived protein C quantitated by amino 
acid analysis. 

0 B - Amino-Tftrminal . g^gn^n,--! n g 

Amino -terminal sequencing of the transgenic 
material was performed to ascertain the extent of 
prosequence removal and to evaluate the presence of gamma - 
carboxylation. There were three possible N- terminal 

.5 sequences of protein C. These were: 1) Prosecuence which 
directs gamma -carboxylation and could have remained on the 
light chain if the first cleavage site was incompletely 
processed, 2) the light chain and 3) the heavy chain. N- 
terminal sequencing of protein C obtained froir transgenic 
milk should have contained only the latter two sequences 
if correct processing had occurred at both of the cleavage 
sites. Amino- terminal sequencing would have also been 
expected to reveal the presence of gamma -carboxylation in 
the light chain. There are nine sites of carboxylation in 
the first twenty-nine amino acids of the light chain. On 
an analysis of released amino acids, the PTH- gamma 
carboxylic acid derivatives eluted from the HPLC column in 
the break-through and could therefore be analyzed. Thus, 
a gamma carboxylic acid showed up on the ami no- terminal 
sequence as a space rather than a glutamic acid. 

The yields of amino acids in pmol released from 
the sequencing of approximately 27 pmol (1.4 ul) of 
purified transgenic protein C corresponded well to those 
expected for an equimolar mixture of light and heavy 
chains, and no obvious sequence was discernible for the 
prosequence. Moreover, no other aberrant sequences were 
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detected, thus indicating a lack of inappropriate 
proteolytic cleavages . 

As stated previously, gamma -carboxyl at ed 
glutamate residues were expected to sequence as blanks 
5 using standard instrument conditions. However, sequencing 
protein C gives a double sequence which must be 
deconvoluted using knowledge of the expected light and 
heavy chain sequences. Normally, if the light chain alone 
were sequenced the gla residues at positions six and seven 

10 would appear as blanks. However when sequenced as intact 
protein C, the heavy chain sequence contains a glutamate 
residue at position six. Therefore, the only indirect 
confirmation of the presence of a gla residue in the light 
chain was the absence of glutamate at position seven which 

15 was not 'over written 1 by a glutamate in the heavy chain 
(Figure 2) ♦ Two other indirect confirmations of the 
presence of gamma carboxylation of the transgenic product 
are described below. 

20 c. Mass Analysis of the Purified Light Chain 

The protein sequence of the transgenic -derived 
protein C precursor had been modified with an Arg-Arg-Lys- 
Arg (SEQ ID NO: 20) cleavage site between the light and 
heavy chains to promote more efficient cleavage of the 

25 single chain to 2-chain form. Western blot analysis of 
the transgenic protein C milk and examination of the 
purified protein C on reducing gels had already confirmed 
that efficient cleavage had occurred. Normally during 
secretion, but after cleavage of the plasma-derived 

30 material, the two basic amino acids at the carboxy- 
terminus of the light chain are trimmed back by a basic 
carboxypeptide . Establishing whether the carboxy- terminus 
of the transgenic protein C light chain had been processed 
to remove the two extra basic amino acids introduced by 

35 this modification, as well as the two natural ones, was 
achieved by measuring the mass of the purified light chain 
in a quadropole instrument using on-line liquid 
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chromatography and electro- spray ionization. In order to 
achieve this, all of the cysteine residues of protein C 
were reduced and alkylated, and then the two chains were 
separated by reversed-phase chromatography. 

5 

CI. Reductive AT kyl ah i r\r> 

Because protein C is heavily cross -linked for a 
molecule of approximately 52 kDa, with twelve disulfide 
bridges (17 of the 24 cysteines involved are in the light 
10 chain) , it was necessary to reductively alkylate the 
entire protein before attempting to separate the chains by 
reversed-phase chromatography. in view of the large 
number of cysteines in the light chain, alkylatation was 
done with iodoacetaroide, in place of the more commonly 

15 used vinyl pyridine, to prevent the molecule from becoming 
excessively hydrophobic . 

The transgenic protein C material (6 nmol of 
protein or 144 pmol of thiol) was reductively alkylated as 
follows: 0.5 mg of protein C (by ELISA) in 0.5 ml of TBS 

20 was added to 50 ul of 1 M Tris pH 8.0, 450 u:. water, 570 
mg guanidinium chloride, and 10 ul at 50 mg/ml DTT (0.3 u 
mol representing a 20 fold excess of added thiol over 
cysteine thiol. The mixture was incubated for 2 hours at 
,37°c. After incubation, 20 ul at 120 mg/ml iodoacetamide 

25 (0.6 M representing a 2 fold excess over DTT on a molar 
basis) was added, and the mixture was incubated in the 
dark for one hour at 4°C. The reaction was quenched by 
adding 50 ul at 50 mg/ml DTT representing a 2.5 fold 
excess over iodoacetaroide. The sample (final volume 1.5 

30 ml) was stored at -20°C until analysis. 

D. Purification of t-h<=> lAnht rhrnn 

Purification of protein C light chain was 
achieved using a large pore polystyrene column with 
35 divinyl benzene interactive groups (PLRP-S, 4000A, 8um, 
2.1 mm ID: Polymer Laboratories, Shropshire. UK). The 
optimum conditions for separation of the heavy and light 
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chains were determined to be: solvent A (0.1% TFA) and 
solvent B (100% acetonitrile) at a flow of 0.5 ml/min with 
a detector wavelength of 215 nni and a gradient of 30 to 
60% solvent B over 60 min. 
5 Fractions were collected across the eluted 

peaks, and samples (10 jal) were analyzed by SDS PAGE on 4- 
20% gradient acrylamide gels under non-reducing 
conditions. The light chain (fractions 3 to 5) was 
completely resolved from both the heavy chain (fractions 7 
10 to 9) and a single fraction (6) which contained a mixture 
of heavy chain and what appeared to be unglycosylated 
light chain . 

A sample containing fully resolved light chain 
was prepared for deglycosylation by centrifugal 
15 evaporation under reduced pressure at room temperature. 
Deglycosylation was carried out using peptide N-glycanase 
(PNGase; Oxford Glycosystems , Oxford, UK) . The protein 
sample was redissolved in 50 jal of buffer and incubated 
overnight with 1 unit (5 |il) PNGase, according to 
20 manufacturer's specifications. 

The light chain was purified from reduced and 
alkylated plasma -derived protein C by the same method and 
deglycosylated for further analysis. 

2 5 E . Analysis by Mass Spectroscopy 

Samples of purified light chain were subjected 
to mass analysis using a liquid chromatography 
electrospray interface to a Sciex Quadropole Mass Analyser 
(Sciex/Perkin Elmer, Toronto, CA) . The LC system used a 

30 0.5 mm ID column packed with PLRP-S 4000A, 8|im resin 
(Polymer Laboratories) . The solvent system contained 
buffer A (0.1% formic acid), buffer B (0.1% formic acid 
and a 5:2 (v/v) mixture of ethanol to propan-l-ol) . The 
gradient used was from 5-60% buffer B over 35 minutes at a 

35 flow rate of 25 fil per minute. The outflow of the column 
was linked via a UV detector to the mass spectrometer 
which was run in positive -ion mode. 
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The purified and deglycosylated transgenic light 
chain was analyzed and gave a relatively weak spectrum 
which was reconstructed to give two components with masses 
of 18,911.0 and 18,971.0. The plasma light chain was also 
5 analyzed and gave a stronger signal with a single major 
component. The spectrum of the plasma light chain was 
reconstructed to give a single mass of 18,970.0. 

The predicted mass for the light cha.in carrying 
nine gamma - carboxy glutamic acids, one p-hydroxy aspartic 
10 acid and seventeen carbamidomethyl cysteine residues and 
ending with Leu 155 was 18966.9723, which is very close to 
the masses detected for the transgenic (18.971.0) and 
plasma-derived (18,970.0) light chains. The small 

differences in mass were well within the accuracy 
15 limitations for this instrument, particularly with the L.C 
delivery. This shows that the mass of the redirect ively- 
alkylated and deglycosylated transgenic light chain is 
essentially identical to that for the plasma -derived 
protein C. This implies that both molecules have 

undergone the same post - translat ional modifications and 
that the transgenic material is fully gamma Cc.rboxylated, 
has had all four basic amino acids trimmed back from the 
carboxy- terminus of the light chain and ha;3 single p 
hydroxy- alanine . 
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F- Activity Mpasnrpmpntp 

The activity of the transgenic protein C was 
compared with that of the plasma-derived material in a 
coagulation assay. First each sample of protein C, 
30 quantitated by amino acid composition analysis, was 
activated by incubation with Protac, a snake venom 
(American Diagnostica Inc, Greenwich, CT) at a venom to 
protein ratio of 1 Unit Protac: io ug protein C for 60 
minutes at 37°C. Aliquot s of the activated material were 
then compared for their ability to prolong the clotting 
time of protein C depleted human plasma (Diagnostic 
Reagents Ltd) in the presence of activated partial 
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thromboplastin time reagent - cephalin from rabbit brain 
(Sigma) and calcium using a mechanical coagulometer 
(Diagnostica Stago, Asmieres, FR) . A comparison of 

clotting times with various additions of transgenic and 
5 plasma -derived protein C (Figure 3) shows that the two 
preparations had the same anti -coagulant activity per mg 
of protein. 

In summary, results show that the sheep-derived 
transgenic protein C is correctly post- translationally 

10 processed, with respect to gamma -carboxylat ion and 
probably beta-hydroxylation, and has anticoagulant 
activity fully equivalent to a high quality purified 
plasma standard. The results demonstrate that the C- 
terminal processing of the light chain, with the modified 

15 RRKR cleavage site rather than the naturally occurring KR 
site, has the two extra basic amino acids removed along 
with the natural ones. 

From the foregoing, it will be appreciated that, 
20 although specific embodiments of the invention have been 
described herein for purposes of illustration, various 
modifications may be made without deviating from the 
spirit and scope of the invention. Accordingly, the 
invention is not limited except as by the appended claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANTS: ZymoGenetics . Inc. 

1201 Eastlake Avenue East 

Seattle 

WA 

USA 

98102 

PPL Therapeutics 

Ros 1 i n 

Edinburgh 

Scotland 

UK 

EH25 9PP 

(ii) TITLE OF INVENTION: PROTEIN C PRODUCTION IN TRANSGENIC 
ANIMALS 

(iii) NUMBER OF SEQUENCES: 25 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: ZymoGenetics. Inc 

(B) STREET: 1201 Eastlake Avenue East 

(C) CITY: Seattle 

(D) STATE: WA 

(E) COUNTRY: USA 

(F) ZIP: 98102 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patent In Release #1.0. Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Sawislak. Deborah A 
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(B) REGISTRATION NUMBER: 37.438 

(C) REFERENCE/DOCKET NUMBER: 95-28PC 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 206-442-6672 

(B) TELEFAX: 206-442-6678 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11725 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: join (3520. .3630. 5093.. 5117. 5210.. 5347. 5450 

..5584. 8253..8395. 9269.. 9386. 10516. . 11102) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 



AGTGAATCTG 


GGCGAGTAAC 


ACAAAACTTG 


AGTGTCCTTA 


CCTGAAAAAT 


AGAGGTTAGA 


60 


GGGATGCTAT 


GTGCCATTGT 


GTGTGTGTGT 


TGGGGGTGGG 


GATTGGGGGT 


GATTTGTGAG 


120 


CAATTGGAGG 


TGAGGGTGGA 


GCCCAGTGCC 


CAGCACCTAT 


GCACTGGGGA 


CCCAAAAAGG 


180 


AGCATCTTCT 


CATGAI 1 1 IA 


TGTATCAGAA 


ATTGGGATGG 


CATGTCATTG 


GGACAGCGTC 


240 


TTTTTTCTTG 


TATGGTGGCA 


CATAAATACA 


TGTGTCTTAT 


AATTAATGGT 


Al 1 1 IAGATT 


300 


TGACGAAATA 


TGGAATATTA 


CCTGTTGTGC 


TGATCTTGGG 


CAAACTATAA 


TATCTCTGGG 


360 


CAAAAATGTC 


CCCATCTGAA 


AAACAGGGAC 


AACGTTCCTC 


CCTCAGCCAG 


CCACTATGGG 


420 


GCTAAAATGA 


GACCACATCT 


GTCAAGGGTT 


TTGCCCTCAC 


CTCCCTCCCT 


GCTGGATGGC 


480 


ATCCTTGGTA 


GGCAGAGGTG 


GGCTTCGGGC 


AGAACAAGCC 


GTGCTGAGCT 


AGGACCAGGA 


540 
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GTGCTAGTGC CACTGTTTGT CTATGGAGAG GGAGGCCTCA GTGCTGAGGG CCAAGCAAAT 600 

ATTTGTGGTT ATGGATTAAC TCGAACTCCA GGCTGTCATG GCGGCAGGAC GGCGAACTTG 660 

CAGTATCTCC ACGACCCGCC CCTGTGAGTC CCCCTCCAGG CAGGTCTATG AGGGGTGTGG 720 

AGGGAGGGCT GCCCCCGGGA GAAGAGAGCT AGGTGGTGAT GAGGGCTGAA TCCTCCAGCC 780 

AGGGTGCTCA ACAAGCCTGA GCTTGGGGTA AAAGGACACA AGGCCCTCCA CAGGCCAGGC 840 

CTGGCAGCCA CAGTCTCAGG TCCCTTTGCC ATGCGCCTCC CTCTTTCCAG GCCAAGGGTC 900 

CCCAGGCCCA GGGCCATTCC AACAGACAGT TTGGAGCCCA GGACCCTCCA TTCTCCCCAC 960 

CCCACTTCCA CCTTTGGGGG TGTCGGATTT GAACAAATCT CAGAAGCGGC CTCAGAGGGA 1020 

GTCGGCAAGA ATGGAGAGCA GGGTCCGGTA GGGTGTGCAG AGGCCACGTG GCCTA7CCAC 1080 

TGGGGAGGGT TCCTTGATCT CTGGCCACCA GGGCTATCTC TGTGGCCTTT TGGAGCAACC 1140 

TGGTGGTTTG GGGCAGGGGT TGAATTTCCA GGCCTAAAAC CACACAGGCC TGGCCTTGAG 1200 

TCCTGGCTCT GCGAGTAATG CATGGATGTA AACATGGAGA CCCAGGACCT TGCCTCAGTC 1260 

TTCCGAGTCT GGTGCCTGCA GTGTACTGAT GGTGTGAGAC CCTACTCCTG GAGGATGGGG 1320 

GACAGAATCT GATCGATCCC CTGGGTTGGT GACTTCCCTG TGCAATCAAC GGAGACCAGC 1380 

AAGGGTTGGA TTTTTAATAA ACCACTTAAC TCCTCCGAGT CTCAGTTTCC CCCTCTATGA 1440 

AATGGGGTTG ACAGCATTAA TAACTACCTC TTGGGTGGTT GTGAGCCTTA ACTGAA3TCA 1500 

TAATATCTCA TGTTTACTGA GCATGAGCTA TGTGCAAAGC CTGTTTTGAG AGCTTWGT 1560 

GGACTAACTC CTTTAATTCT CACAACACCC TTTAAGGCAC AGATACACCA CGTTATrCCA 1620 

TCCATTTTAC AAATGAGGAA ACTGAGGCAT GGAGCAGTTA AGCATCTTGC CCAACATTGC 1680 

CCTCCAGTAA GTGCTGGAGC TGGAATTTGC ACCGTGCAGT CTGGCTTCAT GGCCTGXCT 1740 

GTGAATCCTG TAAAAATTGT TTGAAAGACA CCATGAGTGT CCAATCAACG TTAGCT.WTA 1800 

TTCTCAGCCC AGTCATCAGA CCGGCAGAGG CAGCCACCCC ACTGTCCCCA GGGAGGACAC 1860 

AAACATCCTG GCACCCTCTC CACTGCATTC TGGAGCTGCT TTCTAGGCAG GCAGTGTGAG 1920 
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CTCAGCCCCA 


CGTAGAGCGG 


GCAGCCGAGG 


CCTTCTGAGG 


CTATGTCTCT 


AGCGAACAAG 


1980 


GACCCTCAAT 


TCCAGCTTCC 


GCCTGACGGC 


CAGCACACAG 


GGACAGCCCT 


TTCATTCCGC 


2040 


TTCCACCTGG 


GGGTGCAGGC 


AGAGCAGCAG 


CGGGGGTAGC 


ACTGCCCGGA 


GCTCAGAAGT 


2100 


CCTCCTCAGA 


CAGGTGCCAG 


TGCCTCCAGA 


ATGTGGCAGC 


TCACAAGCCT 


CCTGCTGTTC 


2160 


GTGGCCACCT 


GGGGAATTTC 


CGGCACACCA 


GCTCCTCTTG 


GTAAGGCCAC 


CCCACCCCTA 


2220 


CCCCGGGACC 


CTTGTGGCCT 


CTACAAGGCC 


CTGGTGGCAT 


CTGCCCAGGC 


CTTCACAGCT 


2280 


TCCACCATCT 


CTCTGAGCCC 


TGGGTGAGGT 


GAGGGGCAGA 


TGGGAATGGC 


AGGAATCAAC 


2340 


TGACAAGTCC 


CAGGTAGGCC 


AGCTGCCAGA 


GTGCCACACA 


GGGGCTGCCA 


GGGCAGGCAT 


2400 


GCGTGATGGC 


AGGGAGCCCC 


GCGATGACCT 


CCTAAAGCTC 


CCTCCTCCAC 


ACGGGGATGG 


2460 


TCACAGAGTC 


CCCTGGGCCT 


TCCCTCTCCA 


CCCACTCACT 


CCCTCMCTG 


TGAAGACCCC 


2520 


AGGCCCAGGC 


TACCGTCCAC 


ACTATCCAGC 


ACAGCCTCCC 


CTACTCAAAT 


GCACACTGGC 


2580 


CTCATGGCTG 


CCCTGCCCCA 


ACCCC II 1 CC 


TGGTCTCCAC 


AGCCAACGGG 


AGGAGGCCAT 


2640 


GATTCTTGGG 


GAGGTCCGCA 


GGCACATGGG 


CCCCTAMGC 


CACACCAGGC 


TGTTGGTTTC 


2700 


ATTTGTGCCT 


TTATAGAGCT 


GTTTATCTGC 


TTGGGACCTG 


CACCTCCACC 


CTTTCCCAAG 


2760 


GTGCCCTCAG 


CTCAGGCATA 


CCCTCCTCTA 


GGATGCCTTT 


TCCCCCATCC 


CTTCTTGCTC 


2820 


ACACCCCCAA 


CTTGATCTCT 


CCCTCCTAAC 


TGTGCCCTGC 


ACCAAGACAG 


ACACTTCACA 


2880 


GAGCCCAGGA 


CACACCTGGG 


GACCCTTCCT 


GGGTGATAGG 


TCTGTCTATC 


CTCCAGGTGT 


2940 


CCCTGCCCAA 


GGGGAGAAGC 


ATGGGGAATA 


CTTGGTTGGG 


GGAGGAAAGG 


AAGACTGGGG 


3000 


GGATGTGTCA 


AGATGGGGCT 


GCATGTGGTG 


TACTGGCAGA 


AGAGTGAGAG 


GATTTAACTT 


3060 


GGCAGCCTTT 


ACAGCAGCAG 


CCAGGGCTTG 


AGTACTTATC 


TCTGGGCCAG 


GCTGTATTGG 


3120 


ATGTTTTACA 


TGACGGTCTC 


ATCCCCATGT 


1 1 1 IGGATGA 


GTAAATTGAA 


CCTTAGAAAG 


3180 


GTAAAGACAC 


TGGCTCAAGG 


TCACACAGAG 


ATCGGGGTGG 


GGTTCACAGG 


GAGGCCTGTC 


3240 
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CATCTCAGAG CAAGGCTTCG TCCTCCAACT GCCATCTGCT TCCTGGGGAG GAAAAGAGCA 3300 

GAGGACCCCT GCGCCAAGCC ATGACCTAGA ATTAGAATGA GTCTTGAGGG GGCGClAGACA 3360 

AGACCTTCCC AGGCTCTCCC AGCTCTGCTT CCTCAGACCC CCTCATGGCC CCAGCCCCTC 3420 

TTAGGCCCCT CACCAAGGTG AGCTCCCCTC CCTCCAAAAC CAGACTCAGT GTTC7CCAGC 3480 

AGCGAGCGTG CCCACCAGGT GCTGCGGATC CGCAAACGT GCC AAC TCC TTC CTG 3534 

GAG GAG CTC CGT CAC AGC AGC CTG GAG CGG GAG TGC ATA GAG GAG ATC 3582 

TGT GAC TTC GAG GAG GCC AAG GAA ATT TTC CAA AAT GTG GAT GAC ACA 3630 

GTAAGGCCAC CATGGGTCCA GAGGATGAGG CTCAGGGGCG AGCTGGTAAC CAGC^GGGGC 3690 

CTCGAGGAGC AGGTGGGGAC TCAATGCTGA GGCCCTCTTA GGAGTTGTGG GGGTGGCTGA 3750 

GTGGAGCGAT TAGGATGCTG GCCCTATGAT GTCGGCCAGG CACATGTGAC TGCAAGAAAC 3810 

AGAATTCAGG AAGAAGCTCC AGGAAAGAGT GTGGGGTGAC CCTAGGTGGG GACTCCCACA 3870 

GCCACAGTGT AGGTGGTTCA GTCCACCCTC CAGCCACTGC TGAGCACCAC TGCCTCCCCG 3930 

TCCCACCTCA CAAAGAGGGG ACCTAAAGAC CACCCTGCTT CCACCCATGC CTCTGCTGAT 3990 

CAGGGTGTGT GTGTGACCGA AACTCACTTC TGTCCACATA AAATCGCTCA CTCTGTGCCT 4050 

CACATCAAAG GGAGAAAATC TGATTGTTCA GGGGGTCGGA AGACAGGGTC TGTGTXTAT 4110 

TTGTCTAAGG GTCAGAGTCC TTTGGAGCCC CCAGAGTCCT GTGGACGTGG CCCTA3GTAG 4170 

TAGGGTGAGC TTGGTAACGG GGCTGGCTTC CTGAGACAAG GCTCAGACCC GCTCT3TCCC 4230 

TGGGGATCGC TTCAGCCACC AGGACCTGAA AATTGTGCAC GCCTGGGCCC CCTTC:AAGG 4290 

CATCCAGGGA TGCTTTCCAG TGGAGGCTTT CAGGGCAGGA GACCCTCTGG CCTGC^CCCT 4350 

CTCTTGCCCT CAGCCTCCAC CTCCTTGACT GGACCCCCAT CTGGACCTCC ATCCCZACCA 4410 

CCTCTTTCCC CAGTGGCCTC CCTGGCAGAC ACCACAGTGA CTTTCTGCAG GCACAFATCT 4470 

GATCACATCA AGTCCCCACC GTGCTCCCAC CTCACCCATG GTCTCTCAGC CCCAG:AGCC 4530 

TTGGCTGGCC TCTCTGATGG AGCAGGCATC AGGCACAGGC CGTGGGTCTC AACGTGGGCT 4590 
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GGGTGGTCCT GGACCAGCAG CAGCCGCCGC AGCAGCAACC CTGGTACCTG GTTAGGAACG 4650 

CAGACCCTCT GCCCCCATCC TCCCAACTCT GAAAAACACT GGCTTAGGGA AAGGCGCGAT 4710 

GCTCAGGGGT CCCCCAAAGC CCGCAGGCAG AGGGAGTGAT GGGACTGGAA GGAGGCCGAG 4770 

TGACTTGGTG AGGGATTCGG GTCCCTTGCA TGCAGAGGCT GCTGTGGGAG CGGACAGTCG 4830 

CGAGAGCAGC ACTGCAGCTG CATGGGGAGA GGGTGTTGCT CCAGGGACGT GGGATGGAGG 4890 

CTGGGCGCGG GCGGGTGGCG CTGGAGGGCG GGGGAGGGGC AGGGAGCACC AGCTCCTAGC 4950 

AGCCAACGAC CATCGGGCGT CGATCCCTGT TTGTCTGGAA GCCCTCCCCT CCCCTGCCCG 5010 

CTCACCCGCT GCCCTGCCCC ACCCGGGCGC GCCCCTCCGC ACACCGGCTG CAGGAGCCTG 5070 

ACGCTGCCCG CTCTCTCCGC AG CTG GCC TTC TGG TCC AAG CAC GTC G 5117 

GTGAGTGCGT TCTAGATCCC CGGCTGGACT ACCGGCGCCC GCGCCCCTCG GGATCTCTGG 5177 

CCGCTGACCC CCTACCCCGC CTTGTGTCGC AG AC GGT GAC CAG TGC TTG GTC 5229 

TTG CCC TTG GAG CAC CCG TGC GCC AGC CTG TGC TGC GGG CAC GGC ACG 5277 

TGC ATC GAC GGC ATC GGC AGC TTC AGC TGC GAC TGC CGC AGC GGC TGG 5325 

GAG GGC CGC TTC TGC CAG CGC G GTGAGGGGGA GAGGTGGATG CTGGCGGGCG 5377 

GCGGGGCGGG GCTGGGGCCG GGTTGGGGGC GCGGCACCAG CACCAGCTGC CCGCGCCCTC 5437 

CCCTGCCCGC AG AG GTG AGC TTC CTC AAT TGC TCT CTG GAC AAC GGC 5484 

GGC TGC ACG CAT TAC TGC CTA GAG GAG GTG GGC TGG CGG CGC TGT AGC 5532 

TGT GCG CCT GGC TAC AAG CTG GGG GAC GAC CTC CTG CAG TGT CAC CCC 5580 

GCA G GTGAGAAGCC CCCAATACAT CGCCCAGGAA TCACGCTGGG TGCGGGGTGG 5634 

GCAGGCCCCT GACGGGCGCG GCGCGGGGGG CTCAGGAGGG TTTCTAGGGA GGGAGCGAGG 5694 

AACAGAGTTG AGCCTTGGGG CAGCGGCAGA CGCGCCCAAC ACCGGGGCCA CTGTTAGCGC 5754 

AATCAGCCCG GGAGCTGGGC GCGCCCTCCG CTTTCCCTGC TTCCTTTCTT CCTGGCGTCC 5814 
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CCGCTTCCTC CGGGCGCCCC TGCGACCTGG GGCCACCTCC TGGAGCGCAA GCCCA3TGGT 5874 

GGCTCCGCTC CCCAGTCTGA GCGTATCTGG GGCGAGGCGT GCAGCGTCCT CCTCC1TCTA 5934 

GCCTGGCTGC GTTTTTCTCT GACGTTGTCC GGCGTGCATC GCATTTCCCT CTTTAXCCC 5994 

TTGCTTCCTT GAGGAGAGAA CAGAATCCCG ATTCTGCCTT CTTCTATATT TTCCTITTTA 6054 

TGCATTTTAA TCAAATTTAT ATATGTATGA AACTTTAAAA ATCAGAGTTT TACAACTCTT 6114 

ACACTTTCAG CATGCTGTTC CTTGGCATGG GTCCTTTTTT CATTCATTTT CATAA\AGGT 6174 

GGACCCTTTT AATGTGGAAA TTCCTATCTT CTGCCTCTAG GGCATTTATC ACTTATTTCT 6234 

TCTACAATCT CCCCTTTACT TCCTCTATTT TCTCTTTCTG GACCTCCCAT TATTCAGACC 6294 

TCTTTCCTCT AGTTTTATTG TCTCTTCTAT TTCCCATCTC TTTGACTTTG TGnT'CTTT 6354 

CAGGGAACTT TCl l I I I I II CI I I I II II I GAGATGGAGT TTCACTCTTG 7TGTCCCAGG 6414 

CTGGAGTGCA ATGACGTGAT CTCAGCTCAC CACAACCTCC GCCTCCTGGA TTCAAGCGAT 6474 

TCTCCTGCCG-CAGCCTCCCG AGTAGCTGGG ATTACAGGCA TGCGCCACCA CGCCCAGCTA 6534 

ATTTTGTGTT TTTAGTAGAG AAGGGGTTTC TCCGTGTTGG TCAAGCTGGT CTTGAACTCC 6594 

TGACCTCAGG TGATCCACCT GCCTTGGCCT CCTAAAGTGC TGGGATTACA GGCGTGAGCC 6654 

ACCGCGCCCA GCCTCTTTCA GGGAACTTTC TACAACTTTA TAATTCAATT CTTCTGCAGA 6714 

AAAAAATTTT TGGCCAGGCT CAGTAGCTCA GACCAATAAT TCCAGCACTT TGAGACiGCTG 6774 

AGGTGGGAGG ATTGCTTGAG CTTGGGAGTT TGAGACTAGC CTGGGCAACA CAGTGAGACC 6834 

CTGTCTCTAT TTTTAAAAAA AGTAAAAAAA GATCTAAAAA TTTAACTTTT TATTT1GAAA 6894 

TAATTAGATA TTTCCAGGAA GCTGCAAAGA AATGCCTGGT GGGCCTGTTG GCTGTGGGTT 6954 

TCCTGCAAGG CCGTGGGAAG GCCCTGTCAT TGGCAGAACC CCAGATCGTG AGGGCITTCC 7014 

TTTTAGGCTG CTTTCTAAGA GGACTCCTCC AAGCTCTTGG AGGATGGAAG ACGCTCACCC 7074 

ATGGTGTTCG GCCCCTCAGA GCAGGGTGGG GCAGGGGAGC TGGTGCCTGT GCAGGCTGTG 7134 

GACATTTGCA TGACTCCCTG TGGTCAGCTA AGAGCACCAC TCCTTCCTGA AGCGGGGCCT 7194 
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GAAGTCCCTA GTCAGA6CCT CTGGTTCACC TTCTGCAGGC AGGGAGAGGG GAGTCAAGTC 7254 

AGTGAGGAGG GCTTTCGCAG TTTCTCTTAC AAACTCTCAA CATGCCCTCC CACCTGCACT 7314 

GCCTTCCTGG AAGCCCCACA GCCTCCTATG GTTCCGTGGT CCAGTCCTTC AGCTTCTGGG 7374 

CGCCCCCATC ACGGGCTGAG ATTTTTGCTT TCCAGTCTGC CAAGTCAGTT ACTGTGTCCA 7434 

TCCATCTGCT GTCAGCTTCT GGAATTGTTG CTGTTGTGCC CTTTCCATTC TTTTGTTATG 7494 

ATGCAGCTCC CCTGCTGACG ACGTCCCATT GCTCTTTTAA GTCTAGATAT CTGGACTGGG 7554 

CATTCAAGGC CCATTTTGAG CAGAGTCGGG CTGACCTTTC AGCCCTCAGT TCTCCATGGA 7614 

GTATGCGCTC TCTTCTTGGC AGGGAGGCCT CACAAACATG CCATGCCTAT TGTAGCAGCT 7674 

CTCCAAGAAT GCTCACCTCC TTCTCCCTGT AATTCCTTTC CTCTGTGAGG AGCTCAGCAG 7734 

CATCCCATTA TGAGACCTTA CTAATCCCAG GGATCACCCC CAACAGCCCT GGGGTACAAT 7794 

GAGCTTTTAA GAAGTTTAAC CACCTATGTA AGGAGACACA GGCAGTGGGC GATGCTGCCT 7854 

GGCCTGACTC TTGCCATTGG GTGGTACTGT TTGTTGACTG ACTGACTGAC TGACTGGAGG 7914 

GGGTTTGTAA TTTGTATCTC AGGGATTACC CCCAACAGCC CTGGGGTACA ATGAGCCTTC 7974 

AAGAAGTTTA ACAACCTATG TAAGGACACA CAGCCAGTGG GTGATGCTGC CTGGTCTGAC 8034 

TCTTGCCATT CAGTGGCACT GTTTGTTGAC TGACTGACTG ACTGACTGGC TGACTGGAGG 8094 

GGGTTCATAG CTAATATTAA TGGAGTGGTC TAAGTATCAT TGGTTCCTTG AACCCTGCAC 8154 

TGTGGCAAAG TGGCCCACAG GGTGGAGGAG GACCAAGACA GGAGGGCAGT CTCGGGAGGA 8214 

GTGCCTGGCA GGCCCCTCAC CACCTCTGCC TACCTCAG TG AAG TTC CCT TGT 8266 

GGG AGG CCC TGG AAG CGG ATG GAG AAG AAG CGC AGT CAC CTG AAA CGA 8314 

GAC ACA GAA GAC CAA GAA GAC CAA GTA GAT CCG CGG CTC ATT GAT GGG 8362 

AAG ATG ACC AGG CGG GGA GAC AGC CCC TGG CAG GTGGGAGGCG AGGCAGCACC 8415 

GGCTCGTCAC GTGCTGGGTC CGGGATCACT GAGTCCATCC TGGCAGCTAT GCTCAGGGTG 8475 
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CAGAAACCGA GAGGGAAGCG CTGCCATTGC GTTTGGGGGA TGATGAAGGT GGGGGATGCT 8535 

TCAGGGAAAG ATGGACGCAA CCTGAGGGGA GAGGAGCAGC CAGGGTGGGT GAGGGGAGGG 8595 

GCATGGGGGC ATGGAGGGGT CTGCAGGAGG GAGGGTTACA GTTTCTAAAA AGAGC1GGAA 8655 

AGACACTGCT CTGCTGGCGG GATTTTAGGC AGAAGCCCTG CTGATGGGAG AGGGCTAGGA 8715 

GGGAGGGCCG GGCCTGAGTA CCCCTCCAGC CTCCACATGG GAACTGACAC TTACTGGGTT 8775 

CCCCTCTCTG CCAGGCATGG GGGAGATAGG AACCAACAAG TGGGAGTATT TGCCCTGGGG 8835 

ACTCAGACTC TGCAAGGGTC AGGACCCCAA AGACCCGGCA GCCCAGTGGG ACCACAGCCA 8895 

GGACGGCCCT TCAAGATAGG GGCTGAGGGA GGCCAAGGGG AACATCCAGG CAGCCTGGGG 8955 

GCCACAAAGT CTTCCTGGAA GACACAAGGC CTGCCAAGCC TCTAAGGATG AGAGGAGCTC 9015 

GCTGGGCGAT GTTGGTGTGG CTGAGGGTGA CTGAAACAGT ATGAACAGTG CAGGAACAGC 9075 

ATGGGCAAAG GCAGGAAGAC ACCCTGGGAC AGGCTGACAC TGTAAAATGG GCAAAAATAG 9135 

AAAACGCCAG AAAGGCCTAA GCCTATGCCC ATATGACCAG GGAACCCAGG AAAGTGCATA 9195 

TGAAACCCAG GTGCCCTGGA CTGGAGGCTG TCAGGAGGCA GCCCTGTGAT GTCATCATCC 9255 

CACCCCATTC CAG GTG GTC CTG CTG GAC TCA AAG AAG AAG CTG GCC TG: 9304 

GGG GCA GTG CTC ATC CAC CCC TCC TGG GTG CTG ACA GCG GCC CAC T3C 9352 

ATG GAT GAG TCC AAG AAG CTC CTT GTC AGG CTT G GTATGGGCTG 9396 

GAGCCAGGCA GAAGGGGGCT GCCAGAGGCC TGGGTAGGGG GACCAGGCAG GCTGTTZAGG 9456 

TTTGGGGGAC CCCGCTCCCC AGGTGCTTAA GCAAGAGGCT TCTTGAGCTC CACAGA^GT 9516 

GTTTGGGGGG AAGAGGCCTA TGTGCCCCCA CCCTGCCCAC CCATGTACAC CCAGTATTTT 9576 

GCAGTAGGGG GTTCTCTGGT GCCCTCTTCG AATCTGGGCA CAGGTACCTG CACACA3ATG 9636 

TTTGTGAGGG GCTACACAGA CCTTCACCTC TCCACTCCCA CTCATGAGGA GCAGGCFGTG 9696 

TGGGCCTCAG CACCCTTGGG TGCAGAGACC AGCAAGGCCT GGCCTCAGGG CTGTGC3TCC 9756 

CACAGACTGA CAGGGATGGA GCTGTACAGA GGGAGCCCTA GCATCTGCCA AAGCCAZAAG 9816 
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CTGCTTCCCT AGCAGGCTGG GGGCTCCTAT GCATTGGCCC CGATCTATGG CAATTTCTGG 9876 

AGGGGGGGTC TGGCTCAACT CTTTATGCCA AAAAGAAGGC AAAGCATATT GAGAAAGGCC 9936 

AAATTCACAT TTCCTACAGC ATAATCTATG CCAGTGGCCC CGTGGGGCTT GGCTTAGAAT 9996 

TCCCAGGTGC TCTTCCCAGG GAACCATCAG. TCTGGACTGA GAGGACCTTC TCTCTCAGGT 10056 

GGGACCCGGC CCTGTCCTCC CTGGCAGTGC CGTGTTCTGG GGGTCCTCCT CTCTGGGTCT 10116 

CACTGCCCCT GGGGTCTCTC CAGCTACCTT TGCTCCATGT TCCTTTGTGG CTCTGGTCTG 10176 

TGTCTGGGGT TTCCAGGGGT CTCGGGCTTC CCTGCTGCCC ATTCCTTCTC TGGTCTCACG 10236 

GCTCCGTGAC TCCTGAAAAC CAACCAGCAT CCTACCCCTT TGGATTGACA CCTGTTGGCC 10296 

ACTCCTTCTG GCAGGAAAAG TCACCGTTGA TAGGGTTCCA CGGCATAGAC AGGTGGCTCC 10356 

GCGCCAGTGC CTGGGACGTG TGGGTGCACA GTCTCCGGGT GAACCTTCTT CAGGCCCTCT 10416 

CCCAGGCCTG CAGGGGCACA GCAGTGGGTG GGCCTCAGGA AAGTGCCACT GGGGAGAGGC 10476 

TCCCCGCAGC CCACTCTGAC TGTGCCCTCT GCCCTGCAG GA GAG TAT GAC CTG 10529 

CGG CGC TGG GAG AAG TGG GAG CTG GAC CTG GAC ATC AAG GAG GTC TTC 10577 

GTC CAC CCC AAC TAC AGC AAG AGC ACC ACC GAC AAT GAC ATC GCA CTG 10625 

CTG CAC CTG GCC CAG CCC GCC ACC CTC TCG CAG ACC ATA GTG CCC ATC 10673 

TGC CTC CCG GAC AGC GGC CTT GCA GAG CGC GAG CTC AAT CAG GCC GGC 10721 

CAG GAG ACC CTC GTG ACG GGC TGG GGC TAC CAC AGC AGC CGA GAG AAG 10769 

GAG GCC AAG AGA AAC CGC ACC TTC GTC CTC AAC TTC ATC AAG ATT CCC 10817 

GTG GTC CCG CAC AAT GAG TGC AGC GAG GTC ATG AGC AAC ATG GTG TCT 10865 

GAG AAC ATG CTG TGT GCG GGC ATC CTC GGG GAC CGG CAG GAT GCC TGC 10913 

GAG GGC GAC AGT GGG GGG CCC ATG GTC GCC TCC TTC CAC GGC ACC TGG 10961 

TTC CTG GTG GGC CTG GTG AGC TGG GGT GAG GGC TGT GGG CTC CTT CAC 11009 
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AAC TAC GGC GTT TAC ACC AAA GTC AGC CGC TAC CTC GAC TGG ATC CAT 11057 

GGG CAC ATC AGA GAC AAG GAA GCC CCC CAG AAG AGC TGG GCA CCT 11102 

TAGCGACCCT CCCTGCAGGG CTGGGCTTTT GCATGGCAAT GGATGGGACA TTAAA3GGAC 11162 

ATGTAACAAG CACACCGGCC TGCTGTTCTG TCCTTCCATC CCTCTTTTGG GCTCTfCTGG 11222 

AGGGAAGTAA CATTTACTGA GCACCTGTTG TATGTCACAT GCCTTATGAA TAGAATCTTA 11282 

ACTCCTAGAG CAACTCTGTG GGGTGGGGAG GAGCAGATCC AAGTTTTGCG GGGTCTAAAG 11342 

CTGTGTGTGT TGAGGGGGAT ACTCTGTTTA TGAAAAAGAA TAAAAAACAC AACCACGAAG 11402 

CCACTAGAGC CTTTTCCAGG GCTTTGGGAA GAGCCTGTGC AAGCCGGGGA TGCTGMGGT 11462 

GAGGCTTGAC CAGCTTTCCA GCTAGCCCAG CTATGAGGTA GACATGTTTA GCTCATATCA 11522 

CAGAGGAGGA AACTGAGGGG TCTGAAAGGT TTACATGGTG GAGCCAGGAT TCAAA" r CTAG 11582 

GTCTGACTCC AAAACCCAGG TGCTTTTTTC TGTTCTCCAC TGTCCTGGAG GACAGCTGTT 11642 

TCGACGGTGC TCAGTGTGGA GGCCACTATT AGCTCTGTAG GGAAGCAGCC AGAGACCCAG 11702 

AAAGTGTTGG TTCAGCCCAG AAT 11725 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 460 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Trp Gin Leu Thr Ser Leu Leu Leu Phe Val Ala Thr Trp Gly He 
1 5 10 15 

Ser Gly Thr Pro Ala Pro Leu Asp Ser Val Phe Ser Ser Ser Glu Arq 
20 25 30 
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Ala His Gin Val Leu Arg He Arg Lys Arg Ala Asn Ser Phe Leu 61 u 
35 40 45 

Glu Leu Arg His Ser Ser Leu Glu Arg Glu Cys lie Glu Glu He Cys 
50 55 60 

Asp Phe Glu Glu Ala Lys Glu He Phe Gin Asn Val Asp Asp Thr Leu 
65 70 75 80 

Ala Phe Trp Ser Lys His Val Asp Gly Asp Gin Cys Leu Val Leu Pro 
85 90 95 

Leu Glu His Pro Cys Ala Ser Leu Cys Cys Gly His Gly Thr Cys He 
100 105 110 

Asp Gly lie Gly Ser Phe Ser Cys Asp Cys Arg Ser Gly Trp Glu Gly 
115 120 125 

Arg Phe Cys Gin Arg Glu Val Ser Phe Leu Asn Cys Ser Leu Asp Asn 
130 135 140 

Gly Gly Cys Thr His Tyr Cys Leu Glu Glu Val Gly Trp Arg Arg Cys 
145 150 155 160 

Ser Cys Ala Pro Gly Tyr Lys Leu Gly Asp Asp Leu Leu Gin Cys His 
165 170 175 

Pro Ala Val Lys Phe Pro Cys Gly Arg Pro Trp Lys Arg Met Glu Lys 
180 185 190 

Lys Arg Ser His Leu Lys Arg Asp Thr Glu Asp Gin Glu Asp Gin Val 
195 200 205 

Asp Pro Arg Leu He Asp Gly Lys Met thr Arg Arg Gly Asp Ser Pro 
210 215 220 

Trp Gin Val Val Leu Leu Asp Ser Lys Lys Lys Leu Ala Cys Gly Ala 
225 230 235 240 

Val Leu lie His Pro Ser Trp Val Leu Thr Ala Ala His Cys Met Asp 
245 250 255 

Glu Ser Lys Lys Leu Leu Val Arg Leu Gly Glu Tyr Asp Leu Arg Arg 
260 265 270 
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Trp Glu Lys Trp Glu Leu Asp Leu Asp He Lys 61 u Val Phe Val His 
275 280 285 

Pro Asn Tyr Ser Lys Ser Thr Thr Asp Asn Asp He Ala Leu Leu His 
290 295 300 



Leu Ala Gin Pro Ala Thr Leu Ser Gin Thr lie Val Pro He Cys eu 
305 310 315 320 

Pro Asp Ser Gly Leu Ala Glu Arg Glu Leu Asn Gin Ala Gly Gin Glu 

325 330 335 

Thr Leu Val Thr Gly Trp Gly Tyr His Ser Ser Arg Glu Lys Glu Ala 
340 345 350 

Lys Arg Asn Arg Thr Phe Val Leu Asn Phe He Lys lie Pro Val Val 
355 360 365 

Pro His Asn Glu Cys Ser Glu Val Met Ser Asn Met Val Ser Glu Asn 
370 375 380 

Met Leu Cys Ala Gly He Leu Gly Asp Arg Gin Asp Ala Cys Glu Gly 
385 390 395 400 

Asp Ser Gly Gly Pro Met Val Ala Ser Phe His Gly Thr Trp Phe leu 
405 410 415 

Val Gly Leu Val Ser Trp Gly Glu Gly Cys Gly Leu Leu His Asn Tyr 
420 425 430 

Gly Val Tyr Thr Lys Val Ser Arg Tyr Leu Asp Trp lie His Gly His 
435 440 445 

He Arg Asp Lys Glu Ala Pro Gin Lys Ser Trp Ala 
450 455 460 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1386 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..1380 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 



ATG TGG CAG CTC ACA AGC CTC CTG CTG TTC GTG GCC ACC TGG GGA An 48 
Met Trp Gin Leu Thr Ser Leu Leu Leu Phe Val Ala Thr Trp Gly He 
1 5 10 15 

TCC GGC ACA CCA GCT CCT CTT GAC TCA GTG TTC TCC AGC AGC GAG CGT 96 
Ser Gly Thr Pro Ala Pro Leu Asp Ser Val Phe Ser Ser Ser Glu Arg 
20 25 30 

GCC CAC CAG GTG CTG CGG ATC CGC AAA CGT GCC AAC TCC TTC CTG GAG 144 
Ala His Gin Val Leu Arg lie Arg Lys Arg Ala Asn Ser Phe Leu Glu 
35 40 45 

GAG CTC CGT CAC AGC AGC CTG GAG CGG GAG TGC ATA GAG GAG ATC TGT 192 
Glu Leu Arg His Ser Ser Leu Glu Arg Glu Cys He Glu Glu He Cys 
50 55 60 

GAC TTC GAG GAG GCC AAG GAA ATT TTC CAA AAT GTG GAT GAC ACA CTG 240 
Asp Phe Glu Glu Ala Lys Glu He Phe Gin Asn Val Asp Asp Thr Leu 
65 70 75 80 

GCC TTC TGG TCC AAG CAC GTC GAC GGT GAC CAG TGC TTG GTC TTG CCC 288 
Ala Phe Trp Ser Lys His Val Asp Gly Asp Gin Cys Leu Val Leu Pro 
85 90 95 

TTG GAG CAC CCG TGC GCC AGC CTG TGC TGC GGG CAC GGC ACG TGC ATC 336 
Leu Glu His Pro Cys Ala Ser Leu Cys Cys Gly His Gly Thr Cys He 
100 105 110 



GAC GGC ATC GGC AGC TTC AGC TGC GAC TGC CGC AGC GGC TGG GAG GGC 384 
Asp Gly He Gly Ser Phe Ser Cys Asp Cys Arg Ser Gly Trp Glu Gly 
115 120 125 

CGC TTC TGC CAG CGC GAG GTG AGC TTC CTC AAT TGC TCT CTG GAC AAC 432 
Arg Phe Cys Gin Arg Glu Val Ser Phe Leu Asn Cys Ser Leu Asp Asn 
130 135 140 
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G6C GGC TGC ACG CAT TAC TGC CTA GAG GAG GTG GGC TGG CGG CGC TGT 
Gly Gly Cys Thr His Tyr Cys Leu Glu Glu Val Gly Trp Arg Arg ~ys 
145 150 155 160 

AGC TGT GCG CCT GGC TAC AAG CTG GGG GAC GAC CTC CTG CAG TGT f AC 
Ser Cys Ala Pro Gly Tyr Lys Leu Gly Asp Asp Leu Leu Gin Cys His 
165 170 175 

CCC GCA GTG AAG TTC CCT TGT GGG AGG CCC TGG AAG CGG ATG GAG MG 
Pro Ala Val Lys Phe Pro Cys Gly Arg Pro Trp Lys Arg Met Glu lys 
180 185 1% 

AAG CGC AGT CAC CTG AAA CGA GAC ACA GAA GAC CAA GAA GAC CAA GTA 
Lys Arg Ser His Leu Lys Arg Asp Thr Glu Asp Gin Glu Asp Gin Val 
195 200 205 

GAT CCG CGG CTC ATT GAT GGG AAG ATG ACC AGG CGG GGA GAC AGC CCC 
Asp Pro Arg Leu He Asp Gly Lys Met Thr Arg Arg Gly Asp Ser Pro 
210 215 220 

TGG CAG GTG GTC CTG CTG GAC TCA AAG AAG AAG CTG GCC TGC GGG GCA 
Trp Gin Val Val Leu Leu Asp Ser Lys Lys Lys Leu Ala Cys Gly Ala 
225 230 235 240 

GTG CTC ATC CAC CCC TCC TGG GTG CTG ACA GCG GCC CAC TGC ATG GAT 
Val Leu He His Pro Ser Trp Val Leu Thr Ala Ala His Cys Met Asp 
245 250 255 

GAG TCC AAG AAG CTC CTT GTC AGG CTT GGA GAG TAT GAC CTG CGG CGC 
Glu Ser Lys Lys Leu Leu Val Arg Leu Gly Glu Tyr Asp Leu Arq Arq 
260 265 270 

TGG GAG AAG TGG GAG CTG GAC CTG GAC ATC AAG GAG GTC TTC GTC CAC 
Trp Glu Lys Trp Glu Leu Asp Leu Asp lie Lys Glu Val Phe Val His 
27 5 280 285 

CCC AAC TAC AGC AAG AGC ACC ACC GAC AAT GAC ATC GCA CTG CTG CAC 
Pro Asn Tyr Ser Lys Ser Thr Thr Asp Asn Asp He Ala Leu Leu His 
2 90 295 300 

CTG GCC CAG CCC GCC ACC CTC TCG CAG ACC ATA GTG CCC ATC TGC CTC 
Leu Ala Gin Pro Ala Thr Leu Ser Gin Thr He Val Pro He Cys Leu 
305 310 315 320 



480 



528 



576 



624 



672 



720 



768 



816 



864 



912 



960 
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CCG GAC AGC GGC CTT GCA GAG CGC GAG CTC AAT CAG GCC GGC CAG GAG 1008 
Pro Asp Ser Gly Leu Ala Glu Arg Glu Leu Asn Gin Ala Gly Gin Glu 
325 330 335 

ACC CTC GTG ACG GGC TGG GGC TAC CAC AGC AGC CGA GAG AAG GAG GCC 1056 
Thr Leu Val Thr Gly Trp Gly Tyr His Ser Ser Arg Glu Lys Glu Ala 
340 345 350 

AAG AGA AAC CGC ACC TTC GTC CTC AAC TTC ATC AAG ATT CCC GTG GTC 1104 
Lys Arg Asn Arg Thr Phe Val Leu Asn Phe He Lys He Pro Val Val 
355 360 365 

CCG CAC AAT GAG TGC AGC GAG GTC ATG AGC AAC ATG GTG TCT GAG AAC 1152 
Pro His Asn Glu Cys Ser Glu Val Met Ser Asn Met Val Ser Glu Asn 
370 375 380 

ATG CTG TGT GCG GGC ATC CTC GGG GAC CGG CAG GAT GCC TGC GAG GGC 1200 
Met Leu Cys Ala Gly He Leu Gly Asp Arg Gin Asp Ala Cys Glu Gly 
385 390 395 400 

GAC AGT GGG GGG CCC ATG GTC GCC TCC TTC CAC GGC ACC TGG TTC CTG 1248 
Asp Ser Gly Gly Pro Met Val Ala Ser Phe His Gly Thr Trp Phe Leu 
405 410 415 

GTG GGC CTG GTG AGC TGG GGT GAG GGC TGT GGG CTC CTT CAC AAC TAC 1296 
Val Gly Leu Val Ser Trp Gly Glu Gly Cys Gly Leu Leu His Asn Tyr 
420 425 430 

GGC GTT TAC ACC AAA GTC AGC CGC TAC CTC GAC TGG ATC CAT GGG CAC ' 1344 

Gly Val Tyr Thr Lys Val Ser Arg Tyr Leu Asp Trp He His Gly His 
435 440 445 

ATC AGA GAC AAG GAA GCC CCC CAG AAG AGC TGG GCA CCTTAG 1386 
He Arg Asp Lys Glu Ala Pro Gin Lys Ser Trp Ala 
450 455 460 



(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 460 amino acids 

(B) TYPE: amino acid 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

Met Trp Gin Leu Thr Ser Leu Leu Leu Phe Val Ala Thr Trp Gly He 
1 5 10 15 

Ser Gly Thr Pro Ala Pro Leu Asp Ser Val Phe Ser Ser Ser Glu Arq 
20 25 30 

Ala His Gin Val Leu Arg He Arg Lys Arg Ala Asn Ser Phe Leu Glu 
35 40 45 

Glu Leu Arg His Ser Ser Leu Glu Arg Glu Cys He Glu Glu He Cys 
50 55 60 

Asp Phe Glu Glu Ala Lys Glu He Phe Gin Asn Val Asp Asp Thr Leu 
65 - ™ 75 80 

Ala Phe Trp Ser Lys His Val Asp Gly Asp Gin Cys Leu Val Leu Pro 
85 90 95 

Leu Glu His Pro Cys Ala Ser Leu Cys Cys Gly His Gly Thr Cys He 

100 105 ' no 

Asp Gly He Gly Ser Phe Ser Cys Asp Cys Arg Ser Gly Trp Glu Gly 
115 120 125 

Arg Phe Cys Gin Arg Glu Val Ser Phe Leu Asn Cys Ser Leu Asp Asn 
130 135 140 

Gly Gly Cys Thr His Tyr Cys Leu Glu Glu Val Gly Trp Arg Arg Cys 
145 150 155 160 

Ser Cys Ala Pro Gly Tyr Lys Leu Gly Asp Asp Leu Leu Gin Cys His 
165 170 175 

Pro Ala Val Lys Phe Pro Cys Gly Arg Pro Trp Lvs Arg Met Glu Lys 
180 185 igo 

Lys Arg Ser His Leu Lys Arg Asp Thr Glu Asp Gin Glu Asp Gin Val 
l g 5 200 205 

Asp Pro Arg Leu He Asp Gly Lys Met Thr Arg Arg Gly Asp Ser Pro 
210 215 220 ' ' 
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Trp Gin Val Val Leu Leu Asp Ser Lys Lys Lys Leu Ala Cys Gly Ala 
225 230 235 240 

Val Leu He His Pro Ser Trp Val Leu Thr Ala Ala His Cys Met Asp 
245 250 255 

Glu Ser Lys Lys Leu Leu Val Arg Leu Gly Glu Tyr Asp Leu Arg Arg 
260 265 270 

Trp Glu Lys Trp Glu Leu Asp Leu Asp He Lys Glu Val Phe Val His 
275 280 285 

Pro Asn Tyr Ser Lys Ser Thr Thr Asp Asn Asp He Ala Leu Leu His 
290 295 300 

Leu Ala Gin Pro Ala Thr Leu Ser Gin Thr He Val Pro He Cys Leu 
305 310 315 320 

Pro Asp Ser Gly Leu Ala Glu Arg Glu Leu Asn Gin Ala Gly Gin Glu 
325 330 335 

Thr Leu Val Thr Gly Trp Gly Tyr His Ser Ser Arg Glu Lys Glu Ala 
340 345 350 

Lys Arg Asn Arg Thr Phe Val Leu Asn Phe He Lys He Pro Val Val 
355 360 365 

Pro His Asn Glu Cys Ser Glu Val Met Ser Asn Met Val Ser Glu Asn 
370 375 380 

Met Leu Cys Ala Gly He Leu Gly Asp Arg Gin Asp Ala Cys Glu Gly 
385 390 395 400 

Asp Ser Gly Gly Pro Met Val Ala Ser Phe His Gly Thr Trp Phe Leu 
405 410 415 

Val Gly Leu Val Ser Trp Gly Glu Gly Cys Gly Leu Leu His Asn Tyr 
420 425 430 

Gly Val Tyr Thr Lys Val Ser Arg Tyr Leu Asp Trp He His Gly His 
435 440 445 



He Arg Asp Lys Glu Ala Pro Gin Lys Ser Trp Ala 
450 455 460 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10807 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 
ACGCGTGTCG ACCTGCAGGT CAACGGATCT CTGTGTCTGT TTTCATGTTA GTACCACACT 
GTTTTGGTGG CTGTAGCTTT CAGCTACAGT CTGAAGTCAT AAAGCCTGGT ACCTCCAGCT 
CTGTTCTCTC TCAAGATTGT GTTCTGCTGT TTGGGTCTTT AGTGTCTCCA CACAATTTTT 
AGAATTGTTT GTTCTAGTTC TGTGAAAAAT GATGCTGGTA TTTTGATAAG GATTGCATTG 
AATCTGTAAA GCTACAGATA TAGTCATTGG GTAGTACAGT CACTTTAACA ATATTAACTC 
TTCACATCTG TGAGCATGAT ATATTTTCCC CCTCTATATC ATCTTCAATT CCTCCTATCA 
GTTTCTTTCA TTGCAGTTTT CTGAGTACAG GTCTTACACC TCCTTGGTTA GAGTCATTCC 
TCAGTATTTT ATTCCTTTGA TACAATTGTG AATGAGGTAA TTTTCTTAGT TTCTCTTTCT 
GATAGCTCAT TGTTAGTGTA TATATAGAAA AGCAACAGAT TTCTATGTAT TAATTTTGTA 
TCCTGCAACA GATTTCTATG TATTAATTTT GTATCCTGCT ACTTTACGGA ATTCACTTAT 
TAGCTTTTTG GTGACATCTT GAGGATTTTC TGAAGAAAAT GGCATGGTAT GGTAGGACAA 
GGTGTCATGT CATCTGCAAA CAGTGGCAGT TTTCCTTCTT CCCTTCCAAC CTGGATTTCT 
TTGATTTCTT TCTGTCTGAG TACGACTAGG ATTCCCAATA CTATACCGAA TAAAAGTGGC 
AAGAGTGGAC ATCCTTGTCT TATTTTTCTG ACCTTAGAGG AAATGCTTTC AGTT1TTCAC 
CATTAATTAT AATGTTTACT GTGGGCTTGT CATATGTGGC CTTCATTATA TGGAGGTCTA 
TTCCCTCTAT ACCCACCTTG TTGAGAGTTT TTATCATAAA AGTATGTTGA ATTT1GTCAA 
AAGTTTTTCC TGCATCTATT GAGATGATTT TTACTCTTCA ATTCATTAAT GATTTTTATT 
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CTTCATTTTG TTAATGATTT CCATTCTTCA ATTTGTTAAC GTGGTATATC ACATTGATTG 1080 

ATTTGTGGAT ACCTTTGTAT CCCTGGGATA AACCTCACTT GATCATGAGC TTTCAATGTA 1140 

TTTTTGAATT CACTTTGCTA ATATTCTGTT GGGTATTTTT GCATCTCTAT TCATCAATGA 1200 

TATTGGCCTA AGAAAGGTTT TGTCTGGTTT TAGTATCAGG GTGATGCTGG CCTCATAGAG 1260 

AGAGTTTAGA AGCATTTCCT CCTCTTTGAT TTTTCGGAAT AGTTTGAGTA GGATAGGTAT 1320 

TAACTCTTCT TTAAATGTTT GGGGACTTCC CTGGTGAGCC GGTGGTTGAG AATCCGCCTC 1380 

AGGGATGTGG GTTTGATCCC TGGTCAGGGA ACCATTAATA AGATCCCACA TGCTGCAGGC 1440 

AACAAGCCCC CAAGCTGCAA CCACTGAGCT GCAACCGCTG CAGTGCCCAC AGGCCACGAC 1500 

CAGAGAAAGC CCACATACAG CAGGGAAGAC CCAGCACAAC CGGAAAAAGG AGTTTGGTGG 1560 

AATACAGCTG TGAAGCCGTC TGGTCCTGGA CTCCTGCTTG AGGGAATTTT TTAAAAATTA 1620 

TTGATTCAAT TTCATTACTG GTAACTGGTC TGTTCATATT TTCTATTTCT TCCGGGTTCA 1680 

GTCTTGGGAG ATTGTACATG CCTAGGAATG TGTCCGTTTC TTCTAGGTTG TCCATTTTAT 1740 

TGGACATGCA TGGGAGCACA CAGCACCGAC CAGCGAGACT CATGCTGGCT TCCTGGGGCC 1800 

AGGCTGGGGC CCCAAGCAGC ATGGCATCCT AGAGTGTGTG AAAGCCCACT GACCCTGCCC 1860 

AGCCCCACAA TTTCATTCTG AGAAGTGATT CCTTGCTTCT GCACTTACAG GCCCAGGATC 1920 

TGACCTGCTT CTGAGGAGCA GGGGTTTTGG CAGGACGGGG AGATGCTGAG AGCCGACGGG 1980 

GGTCCAGGTC CCCTCCCAGG CCCCCCTGTC TGGGGCAGCC CTTGGGAAAG ATTGCCCCAG 2040 

TCTCCCTCCT ACAGTGGTCA GTCCCAGCTG CCCCAGGCCA GAGCTGCTTT ATTTCCGTCT 2100 

CTCTCTCTGG ATGGTATTCT CTGGAAGCTG AAGGTTCCTG AAGTTATGAA TAGCTTTGCC 2160 

CTGAAGGGCA TGGTTTGTGG TCACGGTTCA CAGGAACTTG GGAGACCCTG CAGCTCAGAC 2220 

GTCCCGAGAT TGGTGGCACC CAGATTTCCT AAGCTCGCTG GGGAACAGGG CGCTTGTTTC 2280 

TCCCTGGCTG ACCTCCCTCC TCCCTGCATC ACCCAGTTCT GAAAGCAGAG CGGTGCTGGG 2340 
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GTCACAGCCT CTCGCATCTA ACGCCGGTGT CCAAACCACC CGTGCTGGTG TTCGGGGGGC 2400 
TACCTATGGG GAAGGGCTTC TCACTGCAGT GGTGCCCCCC GTCCCCTCTG AGATCAGAAG 2460 
TCCCAGTCCG GACGTCAAAC AGGCCGAGCT CCCTCCAGAG GCTCCAGGGA GGGATCCTTG 2520 
CCCCCCCGCT GCTGCCTCCA GCTCCTGGTG CCGCACCCTT GAGCCTGATC TTGTA3ACGC 2580 
CTCAGTCTAG TCTCTGCCTC CGTGTTCACA CGCCTTCTCC CCATGTCCCC TCCGT3TCCC 2640 
CGTTTTCTCT CACAAGGACA CCGGACATTA GATTAGCCCC TGTTCCAGCC TCACCTGAAC 2700 
AGCTCACATC TGTAAAGACC TAGATTCCAA ACAAGATTCC AACCTGAAGT TCCCGjTGGA 2760 

TGTGAGTTCT GGGGCGACAT CCTTCAACCC CATCACAGCT TGCAGTTCAT CGCAAVVCAT 2820 

GGAACCTGGG GTTTATCGTA AAACCCAGGT TCTTCATGAA ACACTGAGCT TCGAG3CTTG 2880 

TTGCAAGAAT TAAAGGTGCT AATACAGATC AGGGCAAGGA CTGAAGCTGG CTAAGOCTCC 2940 

TCTTTCCATC ACAGGAAAGG GGGGCCTGGG GGCGGCTGGA GGTCTGCTCC CGTGAGTGAG 3000 

CTCTTTCCTG CTACAGTCAC CAACAGTCTC TCTGGGAAGG AAACCAGAGG CCAGAGAGCA 3060 

AGCCGGAGCT AGTTTAGGAG ACCCCTGAAC CTCCACCCAA GATGCTGACC AGCCAGCGGG 3120 

CCCCCTGGAA AGACCCTACA GTTCAGGGGG GAAGAGGGGC TGACCCGCCA GGTCCCTGCT 3180 

ATCAGGAGAC ATCCCCGCTA TCAGGAGATT CCCCCACCTT GCTCCCGTTC CCCTATCCCA 3240 

ATACGCCCAC CCCACCCCTG TGATGAGCAG TTTAGTCACT TAGAATGTCA ACTGAAGGCT 3300 

TTTGCATCCC CTTTGCCAGA GGCACAAGGC ACCCACAGCC TGCTGGGTAC CGACGCCCAT 3360 

GTGGATTCAG CCAGGAGGCC TGTCCTGCAC CCTCCCTGCT CGGGCCCCCT CTGTGOTCAG 3420 

CAACACACCC AGCACCAGCA TTCCCGCTGC TCCTGAGGTC TGCAGGCAGC TCGCTGTAGC 3480 

CTGAGCGGTG TGGAGGGAAG TGTCCTGGGA GATTTAAAAT GTGAGAGGCG GGAGG'GGGA 3540 

GGTTGGGCCC TGTGGGCCTG CCCATCCCAC GTGCCTGCAT TAGCCCCAGT GCTGC XAGC 3600 

CGTGCCCCCG CCGCAGGGGT CAGGTCACTT TCCCGTCCTG GGGTTATTAT GACTC TGTC 3660 

ATTGCCATTG CCATTITTGC TACCCTAACT GGGCAGCAGG TGCTTGCAGA GCCCTCGATA 3720 
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CCGACCAGGT 


CCTCCCTCGG 


AGCTCGACCT 


GAACCCCATG 


TCACCCTTGC 


CCCAGCCTGC 


3780 


AGAGG6TGGG 


TGACTGCAGA 


GATCCCTTCA 


CCCAAGGCCA 


CGGTCACATG 


GTTTGGAGGA 


3840 


GCTGGTGCCC 


AAGGCAGAGG 


CCACCCTCCA 


GGACACACCT 


GTCCCCAGTG 


CTGGCTCTGA 


3900 


CCTGTCCTTG 


TCTAAGAGGC 


TGACCCCGGA 


AGTGTTCCTG 


GCACTGGCAG 


CCAGCCTGGA 


3960 


CCCAGAGTCC 


AGACACCCAC 


CTGTGCCCCC 


GCTTCTGGGG 


TCTACCAGGA 


ACCGTCTAGG 


4020 


CCCAGAGGGG 


ACTTCCTGCT 


TGGCCTTGGA 


TGGAAGAAGG 


CCTCCTATTG 


TCCTCGTAGA 


4080 


GGAAGCCACC 


CCGGGGCCTG 


AGGATGAGCC 


AAGTGGGATT 


CCGGGAACCG 


CGTGGCTGGG 


4140 


GGCCCAGCCC 


GGGCTGGCTG 


GCCTGCATGC 


CTCCTGTATA 


AGGCCCCAAG 


CCTGCTGTCT 


4200 


CAGCCCTCCA 


CTCCCTGCAG 


AGCTCAGAAG 


CACGACCCCA 


GGGATATCCC 


TGCAGCCATG 


4260 


AAGTGCCTCC 


TGCTTGCCCT 


GGGCCTGGCC 


CTCGCCTGTG 


GCGTCCAGGC 


CATCATCGTC 


4320 


ACCCAGACCA 


TGAAAGGCCT 


GGACATCCAG 


AAGGTTCGAG 


GGTTGGCCGG 


GTGGGTGAGT 


4380 


TGCAGGGCGG 


GCAGGGGAGC 


TGGGCCTCAG 


AGAGCCAAGA 


GAGGCTGTGA 


CGTTGGGTTC 


4440 


CCATCAGTCA 


GCTAGGGCCA 


CCTGACAAAT 


CCCCGCTGGG 


GCAGCTTCAA 


CCAGGCGTTC 


4500 


ACTGTCTTGC 


ATTCTGGAGG 


CTGGAAGCCC 


AAGATCCAGG 


TGTTGGCAGG 


GCTGGCTTCT 


4560 


CCTGCGGCCG 


CTCTCTGGGG 


AGCAGACGGC 


CGTCTTCTCC 


AGTCCTCTGC 


GCGCCCTGAT 


4620 


TTCCTCTTCC 


TGTGAGGCCA 


CCAGGCCTGC 


TGGAAACACG 


CCTGCCTGCG 


CAGCTTCACA 


4680 


CGACCTTTGT 


CATCTCTTTA 


AAGGCCATGT 


CTCCAGAGTC 


ATGTGTTGAA 


GTTCTGGGGG 


4740 


TTAGTGGGAC 


ACAGTTCAGC 


CCCTAAAAGA 


GTCTCTCTGC 


CCCTCAAATT 


TTCCCCACCT 


4800 


CCAGCCATGT 


rTcrrrAAGA 


TfTAAATGTT 






TfTGGGTfff 


4860 


TCTTTGGGTT 


CAGTGTGAGT 


CTGGGGAGAG 


CATTCCCCAG 


GGTGCAGAGT 


TGGGGGGAGT 


4920 


ATCTCAGGGC 


TGCCCAGGCC 


GGGGTGGGAC 


AGAGAGCCCA 


CTGTGGGGCT 


GGGGGCCCCT 


4980 


TCCCACCCCC 


AGAGTGCAAC 


TCAAGGTCCC 


TCTCCAGGTG 


GCGGGGACTT 


GGCACTCCTT 


5040 
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GGCTATGGCG GCCAGCGACA TCTCCCTGCT GGATGCCCAG AGTGCCCCCC TGAGAGTGTA 5100 

CGTGGAGGAG CTGAAGCCCA CCCCCGAGGG CAACCTGGAG ATCCTGCTGC AGAAATGGTG 5160 

GGCGTCTCTC CCCAACATGG MCCCCCACT CCCCAGGGCT GTGGACCCCC CGGGGGGTGG 5220 

GGTGCAGGAG GGACCAGGGC CCCAGGGCTG GGGAAGAGGG CTCAGAGTTT ACTGGTACCC 5280 

GGCGCTCCAC CCAAGGCTGC CCACCCAGGG CTTTTTTTTT TTTTAAACTT TTATIAATTT 5340 

GATGCTTCAG AACATCATCA AACAAATGAA CATAAAACAT TCATTTTTGT TTACTTGGAA 5400 

GGGGAGATAA AATCCTCTGA AGTGGAAATG CATAGCAAAG ATACATACAA TGAGGCAGGT 5460 

ATTCTGAATT CCCTGTTAGT CTGAGGATTA CAAGTGTATT TGAGCAACAG AGAGACATTT 5520 

TCATCATTTC TAGTCTGAAC ACCTCAGTAT CTAAAATGAA CAAGAAGTCC TGGAAACGAA 5580 

GCAGTGTGGG GATAGGCCCG TGTGAAGGCT GCTGGGAGGC AGCAGACCTG GGTCTTCGGG 5640 

CTCAAGCAGT TCCCGCTACC AGCCCTGTCC ACCTCAGACG GGGGTCAGGG TGCAG3AGAG 5700 

AGCTGGATGG GTGTGGGGGC AGAGATGGGG ACCTGAACCC CAGGGCTGCC TTTTG3GGGT 5760 

GCCTGTGGTC AAGGCTCTCC CTGACCTTTT CTCTCTGGCT TCATCTGACT TCTCCrGGCC 5820 

CATCCACCCG GTCCCCTGTG GCCTGAGGTG ACAGTGAGTG CGCCGAGGCT AGTTGGCCAG 5880 

CTGGCTCCTA TGCCCATGCC ACCCCCCTCC AGCCCTCCTG GGCCAGCTTC TGCCCCTGGC 5940 

CCTCAGTTCA TCCTGATGAA AATGGTCCAT GCCAATGGCT CAGAAAGCAG CTGTCTTTCA 6000 

GGGAGAACGG CGAGTGTGCT CAGAAGAAGA TTATTGCAGA AAAAACCAAG ATCCCTGCGG 6060 

TGTTCAAGAT CGATGGTGAG TCCGGGTCCC TGGGGGACAC CCACCACCCC CGCCCCCGGG 6120 

GACTGTGGAC AGGTTCAGGG GGCTGGCGTC GGGCCCTGGG ATGCTAAGGG ACTGGTGGTG 6180 

ATGAAGACAC TGCCTTGACA CCTGCTTCAC TTGCCTCCCC TGCCACCTGC CCGGGGCCTT 6240 

GGGGCGGTGG CCATGGGCAG GTCCCGGCTG GCGGGCTAAC CCACCAGGGT GACACCCGAG 6300 

CTCTCTTTGC TGGGGGGCGG GCGGTGCTCT GGGCCCTCAG GCTGAGCTCA GGAGGTACCT 6360 

GTGCCCTCCC AGGGGTAACC GAGAGCCGTT GCCCACTCCA GGGGCCCAGG TGCCCCACGA 6420 
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CCCCAGCCCG CTCCACAGCT CCTTCATCTC CTGGAGACAA ACTCTGTCCG CCCTCGCTCA 6480 

TTCACTTGTT CGTCCTAAAT CCGAGATGAT AAAGCTTCGA GGGGGGGTTG GGGTTCCATC 6540 

AGGGCTGCCC TTCCGCCGGG CAGCCTGGGC CACATCTGCC CTTGGCCCCC TCAGGACTCA 6600 

CTCTGACTGG AGGCCCTGCA CTGACTGACG CCAGGGTGCC CAGCCCAGGG TCTCTGGCGC 6660 

CATCCAGCTG CACTGGGTTT GGGTGCTGGT CCTGCCCCCA AGCTGCCCGG ACACCACAGG 6720 

CAGCCGGGGC TGCCCACTGG CCTCGGTCAG GGTGAGCCCC AGCTGCCCCC GCTCAGGGCT 6780 

TGCCCCGACA ATGACCCCAT CCTCAGGACG CACCCCCCTT CCCTTGCTGG GCAGTGTCCA 6840 

GCCCCACCCG AGATCGGGGG AAGCCCTATT TCTTGACAAC TCCAGTCCCT GGGGGAGGGG 6900 

GCCTCAGACT GAGTGGTGAG TGTTCCCAAG TCCAGGAGGT GGTGGAGGGT CCTGGCGGAT 6960 

CCAGAGTTGA CAGTGAGGGC TTCCTGGGCC CCATGCGCCT GGCAGTGGCA GCAGGGAAGA 7020 

GGAAGCACCA TTTCAGGGGT GGGGGATGCC AGAGGCGCTC CCCACCCCGT CTTCGCCGGG 7080 

TGGTGACCCC GGGGGAGCCC CGCTGGTCGT GGAGGGTGCT GGGGGCTGAC TAGCAACCCC 7140 

TCCCCCCCCG TTGGAACTCA CTTTTCTCCC GTCTTGACCG CGTCCAGCCT TGAATGAGAA 7200 

CAAAGTCCTT GTGCTGGACA CCGACTACAA AAAGTACCTG CTCTTCTGCA TGGAAAACAG 7260 

TGCTGAGCCC GAGCAAAGCC TGGCCTGCCA GTGCCTGGGT GGGTGCCAAC CCTGGCTGCC 7320 

CAGGGAGACC AGCTGCGTGG TCCTTGCTGC AACAGGGGGT GGGGGGTGGG AGCTTGATCC 7380 

CCAGGAGGAG GAGGGGTGGG GGGTCCCTGA GTCCCGCCAG GAGAGAGTGG TCGCATACCG 7440 

GGAGCCAGTC TGCTGTGGGC CTGTGGGTGG CTGGGGACGG GGGCCAGACA CACAGGCCGG 7500 

GAGACGGGTG GGCTGCAGAA CTGTGACTGG TGTGACCGTC GCGATGGGGC CGGTGGTCAC 7560 

TGAATCTAAC AGCCTTTGTT ACCGGGGAGT TTCAATTATT TCCCAAAATA AGAACTCAGG 7620 

TACAAAGCCA TCTTTCAACT ATCACATCCT GAAAACAAAT GGCAGGTGAC ATTTTCTGTG 7680 

CCGTAGCAGT CCCACTGGGC ATTTTCAGGG CCCCTGTGCC AGGGGGGCGC GGGCATCGGC 7740 
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GAGTGGAGGC TCCTGGCTGT GTCAGCCGGC CCAGGGGGAG GAAGGGACCC GGACAGCCAG 7800 

AGGTGGGGGG CAGGCTTTCC CCCTGTGACC TGCAGACCCA CTGCACTGCC CTGGGAGGAA 7860 

GGGAGGGGAA CTAGGCCAAG GGGGAAGGGC AGGTGCTCTG GAGGGCAAGG GCAGACCTGC 7920 

AGACCACCCT GGGGAGCAGG GACTGACCCC CGTCCCTGCC CCATAGTCAG GACCCCGGAG 7980 

GTGGACAACG AGGCCCTGGA GAAATTCGAC AAAGCCCTCA AGGCCCTGCC CATGCACATC 8040 

CGGCTTGCCT TCAACCCGAC CCAGCTGGAG GGTGAGCACC CAGGCCCCGC CCTTCZCCAG 8100 

GGCAGGAGCC ACCCGGCCCC GGGACGACCT CCTCCCATGG TGACCCCCAG CTCCC:AGGC 8160 

CTCCCAGGAG GAAGGGGTGG GGTGCAGCAC CCCGTGGGGG CCCCCTCCCC ACCCCZTGCC 8220 

AGGCCTCTCT TCCCGAGGTG TCCAGTCCCA TCCTGACCCC CCCATGACTC TCCCTCCCCC 8280 

ACAGGGCAGT GCCACGTCTA GGTGAGCCCC TGCCGGTGCC TCTGGGGTAA GCTGCCTGCC 8340 

CTGCCCCACG TCCTGGGCAC ACACATGGGG TAGGGGGTCT TGGTGGGGCC TGGGACCCCA 8400 

CATCAGGCCC TGGGGTCCCC CCTGTGAGAA TGGCTGGAAG CTGGGGTCCC TCCTGGCGAC 8460 

TGCAGAGCTG GCTGGCCGCG TGCCACTCTT GTGGGTGACC TGTGTCCTGG CCTCACACAC 8520 

TGACCTCCTC CAGCTCCTTC CAGCAGAGCT AAGGCTAAGT GAGCCAGAAT GGTACCTAAG 8580 

GGGAGGCTAG CGGTCCTTCT CCCGAGGAGG GGCTGTCCTG GAACCACCAG CCATG(iAGAG 8640- 

GCTGGCAAGG GTCTGGCAGG TGCCCCAGGA ATCACAGGGG GGCCCCATGT CCATT'CAGG 8700 

GCCCGGGAGC CTTGGACTCC TCTGGGGACA GACGACGTCA CCACCGCCCC CCCCCCATCA 8760 

GGGGGACTAG MGGGACCAG GACTGCAGTC ACCCTTCCTG GGACCCAGGC CCCTCCAGGC 8820 

CCCTCCTGGG GCTCCTGCTC TGGGCAGCTT CTCCTTCACC AATAAAGGCA TAAACCTGTG 8880 

CTCTCCCTTC TGAGTCTTTG CTGGACGACG GGCAGGGGGT GGAGAAGTGG TGGGGAGGGA 8940 

GTCTGGCTCA GAGGATGACA GCGGGGCTGG GATCCAGGGC GTCTGCATCA CAGTC1TGTG 9000 

ACAACTGGGG GCCCACACAC ATCACTGCGG CTCTTTGAAA CTTTCAGGAA CCAGGGAGGG 9060 

ACTCGGCAGA GACATCTGCC AGTTCACTTG GAGTGTTCAG TCAACACCCA AACTCCACAA 9120 
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AGGACAGAAA 


GTGGAAAATG 


GCTGTCTCTT 


AGTCTAATAA 


ATATTGATAT 


GAAACTCAAG 


9180 


TTGCTCATGG 


ATCAATATGC 


CTTTATGATC 


CAGCCAGCCA 


CTACTGTCGT 


ATCAACTCAT 


9240 


GTACCCAAAC 


GCACTGATCT 


GTCTGGCTAA 


TGATGAGAGA 


TTCCCAGTAG 


AGAGCTGGCA 


9300 


AGAGGTCACA 


GTGAGAACTG 


TCTGCACACA 


CAGCAGAGTC 


CACCAGTCAT 


CCTAAGGAGA 


9360 


TCAGTCCTGG 


TGTTCATTGG 


AGGACTGATG 


TTGAAGCTGA 


AACTCCAATG 


CTTTGGCCAC 


9420 


CTGATGTGAA 


GAGCTGACTC 


ATTTGAAAAG 


ACCCTGATGC 


TGGGAAAGAT 


TGAGGGCAGG 


9480 


AGGAGAAGGG 


GACGACAGAG 


GATGAGATGG 


TTGGATGGCA 


TCACCAACAC 


AATGGACATG 


9540 


GGTTTGGGTG 


GACTCCAGGA 


GTTGGTGATG 


GACAGGGAGG 


CCTGGCGTGC 


TACGGAAGCG 


9600 


GTTTATGGGG 


TCACAAAGAC 


TGAGTGACTG 


AACTGAGCTG 


AACTGAATGG 


AAATGAGGTA 


9660 


TACAGCAAAG 


TGGGGAI 1 1 1 


TTAGATAATA 


AGAATATACA 


CATAACATAG 


TGTATACTCA 


9720 


TAI 1 1 1 IATG 


CATACCTGAA 


TGCTCAGTCA 


CTCAGTCGTA 


TCTGACTCTG 


TGACCTATGG 


9780 


ACCGTAGCCT 


TCCAGGTTTC 


TTCTGTCCAC 


AGAATTCTCC 


AAGGCAAGAA 


TACTGGAGTG 


9840 


GGTAGCCATT 


TCCTCCTCCA 


GGGGATCCTC 


CCGACCCAGG 


GATTGAACCG 


GCATCTCCTG 


9900 


TATTGGCAGG 


TGGATTCTTT 


ACCACTGTGC 


CACCAGGGAA 


GCCCGTGTTA 


CTCTCTATGT 


9960 


CCCACTTAAT 


TACCAAAGCT 


GCTCCAAGAA 


AAAGCCCCTG 


TGCCCTCTGA 


GCTTCCCGGC 


10020 


CTGCAGAGGG 


TGGTGGGGGT 


AGACTGTGAC 


CTGGGAACAC 


CCTCCCGCTT 


CAGGACTCCC 


10080 


GGGCCACGTG 


ACCCACAGTC 


CTGCAGACAG 


CCGGGTAGCT 


CTGCTCTTCA 


AGGCTCATTA 


10140 


TCTTTAAAAA 


AAACTGAGGT 


CTAI 1 1 IGTG 


ACTTCGCTGC 


CGTAACTTCT 


GAACATCCAG 


10200 


TGCGATGGAC 


AGGACCTCCT 


CCCCAGGCCT 






fPTTTArrTA 




TGAGTCACCA 


GACACTCGGG 


GGTGGCCCCG 


CCTTCAGGGT 


GCTCACAGTC 


TTCCCATCGT 


10320 


CCTGATCAAA 


GAGCAAGACC 


AATGACTTCT 


TAGGAGCAAG 


CAGACACCCA 


CAGGACACTG 


10380 


AGGTTCACCA 


GAGCTGAGCT 


GTCCI 1 1 IGA 


ACCTAAAGAC 


ACACAGCTCT 


CGAAGGI 1 ! 1 


10440 
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CTCTTTAATC TGGATTTAAG GCCTACTTGC CCCTCAAGAG GGAAGACAGT CCTGCATGTC 
CCCAGGACAG CCACTCGGTG GCATCCGAGG CCACTTAGTA TTATCTGACC GCACCCTGGA 
ATTAATCGGT CCAAACTGGA CAAAAACCTT GGTGGGAAGT TTCATCCCAG AGGCCTCAAC 
CATCCTGCTT TGACCACCCT GCATCTTTTT TTCTTTTATG TGTATGCATG TATA""ATATA 
TATATATTTT lllllllllC ATTTTTTGGC TGTGCTGGCT GTTCGTTGCA GTTCGGTGCG 
CAGGCTTCTC TCTAGTTTCT CTCTAGTCTT CTCTTATCAC AGAGCAGTCT CTAGACGATC 
GACGCGT 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:6: 
AATTCCGATC GACGCGTCGA CGATATACTC TAGACGATCG ACGCGTA 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:7: 
AAGCTACGCG TCGATCGTCT AGAGTATATC GTCGACGCGT CGATCGG 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
TGGATCCCCT GCCGGTGCCT CTGG 
(2) INFORMATION FOR SEQ ID N0:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 
AACGCGTCAT CCTCTGTGAG CCAG 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: ZC6839 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:10 
ACTACGTAGT 
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(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS - 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: ZC962 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
AGTCACCTGA GAAGAAAACG AGACA 
(2) INFORMATION FOR SEO ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: ZC6303 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
ATTTGCGGCC GCCTGCAGCC ATGTGGCAGC TCACAAGCCT CCTGC 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE. 

(B) CLONE: ZC6337 
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(xi) SEQUENCE DESCRIPTION: SEO ID N0:13: 
CAGGAAGGAG TTGGCGCGCT TGCGCCGTTG CAGCACCTGG TGGGC 45 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: ZC6306 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
CTTCTTCCTG AATTCTGTTT CTTGC 25 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: ZC6338 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CGGATCCGCA AGCGCGCCAA CTCCTTCC 28 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 



WO 97/20043 



78 



PCT/US96/18866 



(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 2C6373 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
AAAGTAAAAA AAGATCTAAA AATTTAAC 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE: 
•(B) CLONE: ZC6305 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GTGTCTCGTT TTCTTCTTAA GTGACTGCGC TT 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: ZC6302 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
TTAAGAAGAA AACGAGACAC AGAAGACCAA GAAGACCAAG TAGATCCGC 



49 
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(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(vii) IMMEDIATE SOURCE : 
(B) CLONE: ZC6304 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GGATCTACTT GGTCTTCTTG GTCTTCTGTG TCTCGTTTTC TTC 43 
(2) INFORMATION FOR SEQ ID N0:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Arg Arg Lys Arg 
1 

(2) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Lys Arg Lys Arg 
1 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Ser His Leu Arg Arg Lys Arg Asp 
1 5 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6763 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

ACGCGTCGAC CTGCAGGTCA ACGGATCTCT GTGTCTGTTT TCATGTTAGT ACCACACTGT 60 

TTTGGTGGCT GTAGCTTTCA GCTACAGTCT GAAGTCATAA AGCCTGGTAC CTCCAGCTCT 120 

GTTCTCTCTC AAGATTGTGT TCTGCTGTTT GGGTCTTTAG TGTCTCCACA CMTTTTAG 180 

AATTGTTTGT TCTAGTTCTG TGAAAAATGA TGCTGGTATT TTGATAAGGA TTGCA'TGAA 240 

TCTGTAAAGC TACAGATATA GTCATTGGGT AGTACAGTCA CTTTAACAAT ATTAACTCTT 300 

CACATCTGTG AGCATGATAT ATTTTCCCCC TCTATATCAT CTTCAATTCC TCCTA'CAGT 360 

TTCTTTCATT GCAGTTTTCT GAGTACAGGT CTTACACCTC CTTGGTTAGA GTCATXCTC 420 

AGTATTTTAT TCCTTTGATA CAATTGTGAA TGAGGTAATT TTCTTAGTTT CTCTTXTGA 480 

TAGCTCATTG TTAGTGTATA TATAGAAAAG CAACAGATTT CTATGTATTA ATTTTGTATC 540 
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CT6CAACAGA TTTCTATGTA TTAATTTTGT ATCCTGCTAC TTTACGGAAT TCACTTATTA 600 

GCTTTTTGGT GACATCT7GA GGATTTTCTG AAGAAAATGG CATGGTATGG TAGGACAAGG 660 

TGTCATGTCA TCTGCAAACA GTGGCAGTTT TCCTTCTTCC CTTCCAACCT GGATTTCTTT 720 

GATTTCTTTC TGTCTGAGTA CGACTAGGAT TCCCAATACT ATACCGAATA AAAGTGGCAA 780 

GAGTGGACAT CCTTGTCTTA TTTTTCTGAC CTTAGAGGAA ATGCTTTCAG TTTTTCACCA 840 

TTAATTATAA TGTTTACTGT GGGCTTGTCA TATGTGGCCT TCATTATATG GAGGTCTATT 900 

CCCTCTATAC CCACCTTGTT GAGAGTTTTT ATCATAAAAG TATGTTGAAT TTTGTCAAAA 960 

GTTTTTCCTG CATCTATTGA GATGATTTTT ACTCTTCAAT TCATTAATGA TTTTTATTCT 1020 

TCATTTTGTT AATGATTTCC ATTCTTCAAT TTGTTAACGT GGTATATCAC ATTGATTGAT 1080 

TTGTGGATAC CTTTGTATCC CTGGGATAAA CCTCACTTGA TCATGAGCTT TCAATGTATT 1140 

TTTGAATTCA CTTTGCTAAT ATTCTGTTGG GTATTTTTGC ATCTCTATTC ATCAATGATA 1200 

TTGGCCTAAG AAAGGTTTTG TCTGGTTTTA GTATCAGGGT GATGCTGGCC TCATAGAGAG 1260 

AGTTTAGAAG CATTTCCTCC TCTTTGATTT TTCGGAATAG TTTGAGTAGG ATAGGTATTA 1320 

ACTCTTCTTT AAATGTTTGG GGACTTCCCT GGTGAGCCGG TGGTTGAGAA TCCGCCTCAG 1380 

GGATGTGGGT TTGATCCCTG GTCAGGGAAC CATTAATAAG ATCCCACATG CTGCAGGCAA 1440 

CAAGCCCCCA AGCTGCAACC ACTGAGCTGC AACCGCTGCA GTGCCCACAG GCCACGACCA 1500 

GAGAAAGCCC ACATACAGCA GGGAAGACCC AGCACAACCG GAAAAAGGAG TTTGGTGGAA 1560 

TACAGCTGTG AAGCCGTCTG GTCCTGGACT CCTGCTTGAG GGAAI 1 1 I I I AAAAATTATT 1620 

GATTCAATTT CATTACTGGT AACTGGTCTG TTCATATTTT CTATTTCTTC CGGGTTCAGT 1680 

CTTGGGAGAT TGTACATGCC TAGGAATGTG TCCGTTTCTT CTAGGTTGTC CATTTTATTG 1740 

GACATGCATG GGAGCACACA GCACCGACCA GCGAGACTCA TGCTGGCTTC CTGGGGCCAG 1800 

GCTGGGGCCC CAAGCAGCAT GGCATCCTAG AGTGTGTGAA AGCCCACTGA CCCTGCCCAG 1860 

CCCCACAATT TCATTCTGAG AAGTGATTCC TTGCTTCTGC ACTTACAGGC CCAGGATCTG 1920 
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ACCTGCTTCT GAGGAGCAGG GGTTTTGGCA GGACGGGGAG ATGCTGAGAG CCGACGGGGG 1980 

TCCAGGTCCC CTCCCAGGCC CCCCTGTCTG GGGCAGCCCT TGGGAAAGAT TGCCCCAGTC 2040 

TCCCTCCTAC AGTGGTCAGT CCCAGCTGCC CCAGGCCAGA GCTGCTTTAT TTCCGTCTCT 2100 

CTCTCTGGAT GGTATTCTCT GGAAGCTGAA GGTTCCTGAA GTTATGAATA GCTT r GCCCT 2160 

GAAGGGCATG GTTTGTGGTC ACGGTTCACA GGAACTTGGG AGACCCTGCA GCTCAGACGT 2220 

CCCGAGATTG GTGGCACCCA GATTTCCTAA GCTCGCTGGG GAACAGGGCG CTTG'TTCTC 2280 

CCTGGCTGAC CTCCCTCCTC CCTGCATCAC CCAGTTCTGA AAGCAGAGCG GTGCVGGGGT 2340 

CACAGCCTCT CGCATCTAAC GCCGGTGTCC AAACCACCCG TGCTGGTGTT CGGGGGGCTA 2400 

CCTATGGGGA AGGGCTTCTC ACTGCAGTGG TGCCCCCCGT CCCCTCTGAG ATCACiAAGTC 2460 

CCAGTCCGGA CGTCAAACAG GCCGAGCTCC CTCCAGAGGC TCCAGGGAGG GATCCTTGCC 2520 

CCCCCGCTGC TGCCTCCAGC TCCTGGTGCC GCACCCTTGA GCCTGATCTT GTAGACGCCT 2580 

CAGTCTAGTC TCTGCCTCCG TGTTCACACG CCTTCTCCCC ATGTCCCCTC CGTGTCCCCG 2640 

TTTTCTCTCA CAAGGACACC GGACATTAGA TTAGCCCCTG TTCCAGCCTC ACCTGAACAG 2700 

CTCACATCTG TAAAGACCTA GATTCCAAAC AAGATTCCAA CCTGAAGTTC CCGG1GGATG 2760 

TGAGTTCTGG GGCGACATCC TTCAACCCCA TCACAGCTTG CAGTTCATCG CAAAACATGG 2820 

AACCTGGGGT TTATCGTAAA ACCCAGGTTC TTCATGAAAC ACTGAGCTTC GAGGCTTGTT 2880 

GCAAGAATTA AAGGTGCTAA TACAGATCAG GGCAAGGACT GAAGCTGGCT AAGCCTCCTC 2940 

TTTCCATCAC AGGAAAGGGG GGCCTGGGGG CGGCTGGAGG TCTGCTCCCG TGAGTGAGCT 3000 

CTTTCCTGCT ACAGTCACCA ACAGTCTCTC TGGGAAGGAA ACCAGAGGCC AGAGAGCAAG 3060 

CCGGAGCTAG TTTAGGAGAC CCCTGAACCT CCACCCAAGA TGCTGACCAG CCAGC3GGCC 3120 

CCCTGGAAAG ACCCTACAGT TCAGGGGGGA AGAGGGGCTG ACCCGCCAGG TCCCT3CTAT 3180 

CAGGAGACAT CCCCGCTATC AGGAGATTCC CCCACCTTGC TCCCGTTCCC CTATC3CAAT 3240 
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ACGCCCACCC 


CACCCCTGTG 


ATGAGCAGTT 


TAGTCACTTA 


GAATGTCAAC 


TGAAGGCTTT 


3300 


TGCATCCCCT 


TTGCCAGAGG 


CACAAGGCAC 


CCACAGCCTG 


CTGGGTACCG 


ACGCCCATGT 


3360 


GGATTCAGCC 


AGGAGGCCTG 


TCCTGCACCC 


TCCCTGCTCG 


GGCCCCCTCT 


GTGCTCAGCA 


3420 


ACACACCCAG 


CACCAGCATT 


CCCGCTGCTC 


CTGAGGTCTG 


CAGGCAGCTC 


GCTGTAGCCT 


3480 


GAGCGGTGTG 


GAGGGAAGTG 


TCCTGGGAGA 


TTTAAAATGT 


GAGAGGCGGG 


AGGTGGGAGG 


3540 


TTGGGCCCTG 


TGGGCCTGCC 


CATCCCACGT 


GCCTGCATTA 


GCCCCAGTGC 


TGCTCAGCCG 


3600 


TGCCCCCGCC 


GCAGGGGTCA 


GGTCACTTTC 


CCGTCCTGGG 


GTTATTATGA 


CTCTTGTCAT 


3660 


TGCCATTGCC 


Al 1 1 1 IGCTA 


CCCTAACTGG 


GCAGCAGGTG 


CTTGCAGAGC 


CCTCGATACC 


3720 


GACCAGGTCC 


TCCCTCGGAG 


CTCGACCTGA 


ACCCCATGTC 


ACCCTTGCCC 


CAGCCTGCAG 


3780 


AGGGTGGGTG 


ACTGCAGAGA 


TCCCTTCACC 


CAAGGCCACG 


GTCACATGGT 


TTGGAGGAGC 


3840 


TGGTGCCCAA 


GGCAGAGGCC 


ACCCTCCAGG 


ACACACCTGT 


CCCCAGTGCT 


GGCTCTGACC 


3900 


TGTCCTTGTC 


TAAGAGGCTG 


ACCCCGGAAG 


TGTTCCTGGC 


ACTGGCAGCC 


AGCCTGGACC 


3960 


CAGAGTCCAG 


ACACCCACCT 


GTGCCCCCGC 


TTCTGGGGTC 


TACCAGGAAC 


CGTCTAGGCC 


4020 


CAGAGGGGAC 


TTCCTGCTTG 


GCCTTGGATG 


GAAGAAGGCC 


TCCTATTGTC 


CTCGTAGAGG 


4080 


AAGCCACCCC 


GGGGCCTGAG 


GATGAGCCAA 


GTGGGATTCC 


GGGAACCGCG 


TGGCTGGGGG 


4140 


CCCAGCCCGG 


GCTGGCTGGC 


CTGCATGCCT 


CCTGTATAAG 


GCCCCAAGCC 


TGCTGTCTCA 


4200 


GCCCTCCACT 


CCCTGCAGAG 


CTCAGAAGCA 


CGACCCCAGG 


GATATCATCG 


ATAAGCTTGG 


4260 


ATCCCCTGCC 


GGTGCCTCTG 


GGGTAAGCTG 


CCTGCCCTGC 


CCCACGTCCT 


GGGCACACAC 


4320 


ATGGGGTAGG 


GGGTCTTGGT 


GGGGCCTGGG 


ACCCCACATC 


AGGCCCTGGG 


GTCCCCCCTG 


4380 


TGAGAATGGC 


TGGAAGCTGG 


GGTCCCTCCT 


GGCGACTGCA 


GAGCTGGCTG 


GCCGCGTGCC 


4440 


ACTCTTGTGG 


GTGACCTGTG 


TCCTGGCCTC 


ACACACTGAC 


CTCCTCCAGC 


TCCTTCCAGC 


4500 


AGAGCTAAGG 


CTAAGTGAGC 


CAGAATGGTA 


CCTAAGGGGA 


GGCTAGCGGT 


CCTTCTCCCG 


4560 


AGGAGGGGCT 


GTCCTGGAAC 


CACCAGCCAT 


GGAGAGGCTG 


GCAAGGGTCT 


GGCAGGTGCC 


4620 
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CCAGGAATCA CAGGGGGGCC CCATGTCCAT TTCAGGGCCC GGGAGCCTTG GACTCCTCTG 4680 

GGGACAGACG ACGTCACCAC CGCCCCCCCC CCATCAGGGG GACTAGAAGG GACCAGGACT 4740 

GCAGTCACCC TTCCTGGGAC CCAGGCCCCT CCAGGCCCCT CCTGGGGCTC CTGCTCTGGG 4800 

CAGCTTCTCC TTCACCAATA AAGGCATAAA CCTGTGCTCT CCCTTCTGAG TCTT7GCTGG 4860 

ACGACGGGCA GGGGGTGGAG AAGTGGTGGG GAGGGAGTCT GGCTCAGAGG ATGACAGCGG 4920 

GGCTGGGATC CAGGGCGTCT GCATCACAGT CTTGTGACAA CTGGGGGCCC ACACACATCA 4980 

CTGCGGCTCT TTGAAACTTT CAGGAACCAG GGAGGGACTG GGCAGAGACA TCTGCCAGTT 5040 

CACTTGGAGT GTTCAGTCAA CACCCAAACT CGACAAAGGA CAGAAAGTGG AAAA"GGCTG 5100 

TCTCTTAGTC TAATAAATAT TGATATGAAA CTCAAGTTGC TCATGGATCA ATATGCCTTT 5160 

ATGATCCAGC CAGCCACTAC TGTCGTATCA ACTCATGTAC CCAAACGCAC TGATCTGTCT 5220 

GGCTAATGAT GAGAGATTCC CAGTAGAGAG CTGGCAAGAG GTCACAGTGA GAAC'GTCTG 5280 

CACACACAGC AGAGTCCACC AGTCATCCTA AGGAGATCAG TCCTGGTGTT CATTGGAGGA 5340 

CTGATGTTGA AGCTGAAACT CCAATGCTTT GGCCACCTGA TGTGAAGAGC TGACTATTT 5400 

GAAAAGACCC TGATGCTGGG AAAGATTGAG GGCAGGAGGA GAAGGGGACG ACAGAGGATG 5460 

AGATGGTTGG ATGGCATCAC CAACACAATG GACATGGGTT TGGGTGGACT CCAGGAGTTG 5520 

GTGATGGACA GGGAGGCCTG GCGTGCTACG GAAGCGGTTT ATGGGGTCAC AAAGACTGAG 5580 

TGACTGAACT GAGCTGAACT GAATGGAAAT GAGGTATACA GCAAAGTGGG GATPTTTAG 5640 

ATAATAAGAA TATACACATA ACATAGTGTA TACTCATATT TTTATGCATA CCTGMTGCT 5700 

CAGTCACTCA GTCGTATCTG ACTCTGTGAC CTATGGACCG TAGCCTTCCA GGTT'CTTCT 5760 

GTCCACAGAA TTCTCCAAGG CAAGAATACT GGAGTGGGTA GCCATTTCCT CCTCCAGGGG 5820 

ATCCTCCCGA CCCAGGGATT GAACCGGCAT CTCCTGTATT GGCAGGTGGA TTCT7ACCA 5880 

CTGTGCCACC AGGGAAGCCC GTGTTACTCT CTATGTCCCA CTTAATTACC AAAGCTGCTC 5940 
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CAAGAAAAAG CCCCTGTGCC CTCTGAGCTT CCCGGCCTGC AGAGGGTGGT GGGGGTAGAC 6000 

TGTGACCTGG GAACACCCTC CCGCTTCAGG ACTCCCGGGC CACGTGACCC ACAGTCCTGC 6060 

AGACAGCCGG GTAGCTCTGC TCTTCAAGGC TCATTATCTT TAAAAAAAAC TGAGGTCTAT 6120 

TTTGTGACTT CGCTGCCGTA ACTTCTGAAC ATCCAGTGCG ATGGACAGGA CCTCCTCCCC 6180 

AGGCCTCAGG GGCTTCAGGG AGCCAGCCTT CACCTATGAG TCACCAGACA CTCGGGGGTG 6240 

GCCCCGCCTT CAGGGTGCTC ACAGTCTTCC CATCGTCCTG ATCAAAGAGC AAGACCAATG 6300 

ACTTCTTAGG AGCAAGCAGA CACCCACAGG ACACTGAGGT TCACCAGAGC TGAGCTGTCC 6360 

TTTTGAACCT AAAGACACAC AGCTCTCGAA GGTTTTCTCT TTAATCTGGA TTTAAGGCCT 6420 

ACTTGCCCCT CAAGAGGGAA GACAGTCCTG CATGTCCCCA GGACAGCCAC TCGGTGGCAT 6480 

CCGAGGCCAC TTAGTATTAT CTGACCGCAC CCTGGAATTA ATCGGTCCAA ACTGGACAAA 6540 

AACCTTGGTG GGAAGTTTCA TCCCAGAGGC CTCAACCATC CTGCTTTGAC CACCCTGCAT 6600 

CI I I 1 1 I ICT TTTATGTGTA TGCATGTATA TATATATATA TAI I I I I I I I TTTTTCATTT 6660 

TTTGGCTGTG CTGGCTGTTC GTTGCAGTTC GGTGCGCAGG CTTCTCTCTA GTTTCTCTCT 6720 

AGTCTTCTCT TATCACAGAG CAGTCTCTAG ACGATCGACG CGT 6763 
(2) INFORMATION FOR SEQ ID N0:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Arg lie Arg Lys Arg 
1 5 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 5 amino acids 
. (B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:25: 

Gin Arg Arg Lys Arg 
1 5 
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CLAIMS 

1. A method for producing protein C in a 
transgenic animal comprising: 

providing a DNA construct comprising a first DNA 
segment encoding a secretion signal and a protein C propeptide 
operably linked to a second DNA segment encoding protein C, 
wherein the encoded protein C comprises a two- chain cleavage 
site modified from Lysine (Lys) -Arginine (Arg) to 1^-1*2-1*3-1*4, 
and wherein each of Ri , R2 , R3 , R4 is individually Lys or Arg, 
and wherein said first and second segments are operably linked 
to additional DNA segments required for expression of the 
protein C DNA in a mammary gland of a host female animal; 

introducing said DNA construct into a fertilized egg 
of a non- human mammalian species; 

inserting said egg into an oviduct or uterus of a 
female of said species to obtain offspring carrying said DNA 
construct; 

breeding said offspring to produce female progeny 
that express said first and second DNA segments and produce 
milk containing protein C encoded by said second segment, 
wherein said protein has anticoagulant activity upon 
activation; 

collecting milk from said female progeny; and 
recovering the protein C from the milk. 

2. The method of claim 1, further comprising the 
seep of activating the protein C. 

3. The method of claim 1, wherein R±-R2" R 3^ R 4 is 
Arg -Arg -Lys -Arg (SEQ ID NO: 20) . 

4. The method of claim 1, wherein said species is 
selected from sheep, rabbits, cattle and goats. 



WO 97/20043 PCT/US96/18866 

88 



5. The method of claim 1, wherein each of said 
first and second DNA segments comprises an intror . 

6. The method of claim 1, wherein the second DNA 
segment comprises a DNA sequence of nucleotides as shown in 
Seq. ID NO: 1 or Seq. ID. NO: 3. 

7. The method of claim 6, wherein the second DNA 
segment comprises the DNA sequence of nucleotides as shown in 
SEQ. ID. NO: 1. 

8. The method of claim 1, wherein the additional 
DNA segments comprise a transcriptional promoter selected from 
the group consisting of casein, fl-lactoglobulin, a-lactalbumin 
and whey acidic protein gene promoters. 

9. The method of claim 8, wherein the 
transcriptional promoter is the (3-lactoglobulin gene promoter. 

10. A transgenic non-human female mammal that 
produces recoverable amounts of human protein C in its milk, 
wherein at least 90% of the human protein C in the milk is 
two-chain protein C. 

11. A process for producing a transgenic offspring 
of a mammal comprising: 

providing a DNA construct comprising a first DNA 
segment encoding a secretion signal and a protein C propeptide 
operably linked to a second DNA segment encoding protein C, 
wherein the encoded protein C comprises a two- chain cleavage 
site modified from Lys-Arg to Ri-R 2 - R 3- R 4, and wherein each of 
R l' R 2' R 3 ' R 4' is individually Lys or Arg, and wherein said 
first and second segments are operably linked to additional 
DNA segments required for expression of the prctein C DNA in 
the mammary gland of a host female animal; 
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introducing said DNA construct into a fertilized egg 
of a non-human mammalian species; and 

inserting said egg into an oviduct or uterus of a 
female of said species to obtain offspring carrying said DNA 
construct . 

12. The process according to claim 11, wherein R^- 
R2-R3-R4 is Arg-Arg-Lys-Arg (SEQ ID NO: 20). 

13. The process according to claim 11, wherein the 
offspring is female. 

14. The process according to claim 11, wherein the 
offspring is male. 

15. A non-human mammal produced according to the 
process of claim 10. 

16. A non-human mammal of claim 15, wherein the 
mammal is female . 

17. A female mammal according to claim 16 that 
produces milk containing protein C encoded by said DNA 
construct, wherein said protein C has anticoagulant activity 
upon activation. 

18. A non-human mammalian embryo containing in its 
nucleus a heterologous DNA segment encoding protein C, wherein 
the encoded protein C comprises a two- chain cleavage site 
modified from Lys-Arg to R2-R2-R3-R4, and wherein each of R 1§ 
R 2 , R3 , R4 , is individually Lys or Arg. 
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22-11-94 

21- 06-94 
19-12-95 



WO 


9211757 


A 


23-07-92 


AU 


1228592 


A 


17-08-92 










EP 


0591219 


A 


13-04-94 










JP 


6507307 


T 


25-08-94 










US 


5589604 


A 


31-12-96 


WO 


9634966 


A 


07-11-96 


AU 


6347496 


A 


21-11-96 
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